Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skorch compatibility #7

Open
lowlypalace opened this issue Aug 7, 2022 · 1 comment
Open

Skorch compatibility #7

lowlypalace opened this issue Aug 7, 2022 · 1 comment
Assignees

Comments

@lowlypalace
Copy link

lowlypalace commented Aug 7, 2022

Hi, as far as I understand, Datascope is compatible with any scikit-learn pipeline. I'm using PyTorch and skorch (library that wraps PyTorch) to make my classifier scikit-learn compatible.

I'm currently getting the following error when trying to compute the score:

ValueError                                Traceback (most recent call last)
[<ipython-input-49-2e03ddd68d36>](https://localhost:8080/#) in <module>()
----> 1 importances.score(test_data, test_labels)

3 frames
[/usr/local/lib/python3.7/dist-packages/datascope-0.0.3-py3.7-linux-x86_64.egg/datascope/importance/importance.py](https://localhost:8080/#) in score(self, X, y, **kwargs)
     38         if isinstance(y, DataFrame):
     39             y = y.values
---> 40         return self._score(X, y, **kwargs)

[/usr/local/lib/python3.7/dist-packages/datascope-0.0.3-py3.7-linux-x86_64.egg/datascope/importance/shapley.py](https://localhost:8080/#) in _score(self, X, y, **kwargs)
    285         units = np.delete(units, np.where(units == -1))
    286         world = kwargs.get("world", np.zeros_like(units, dtype=int))
--> 287         return self._shapley(self.X, self.y, X, y, self.provenance, units, world)
    288 
    289     def _shapley(

[/usr/local/lib/python3.7/dist-packages/datascope-0.0.3-py3.7-linux-x86_64.egg/datascope/importance/shapley.py](https://localhost:8080/#) in _shapley(self, X, y, X_test, y_test, provenance, units, world)

    314             )
    315         elif self.method == ImportanceMethod.NEIGHBOR:
--> 316             return self._shapley_neighbor(X, y, X_test, y_test, provenance, units, world, self.nn_k, self.nn_distance)
    317         else:
    318             raise ValueError("Unknown method '%s'." % self.method)

[/usr/local/lib/python3.7/dist-packages/datascope-0.0.3-py3.7-linux-x86_64.egg/datascope/importance/shapley.py](https://localhost:8080/#) in _shapley_neighbor(self, X, y, X_test, y_test, provenance, units, world, k, distance)
    507             assert isinstance(X_test, spmatrix)
    508             X_test = X_test.todense()
--> 509         distances = distance(X, X_test)
    510 
    511         # Compute the utilitiy values between training and test labels.

sklearn/metrics/_dist_metrics.pyx in sklearn.metrics._dist_metrics.DistanceMetric.pairwise()

ValueError: Buffer has wrong number of dimensions (expected 2, got 4)

Here's a snippet of my code:

from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score

net = reset_model(seed = 0) # gives scikit-learn compatible skorch model

pipeline = Pipeline([("model", net)])

pipeline.fit(train_dataset, train_labels)
y_pred = pipeline.predict(test_dataset)

plot_loss(net)
accuracy_dirty = accuracy_score(y_pred, test_labels)
print("Pipeline accuracy in the beginning:", accuracy_dirty)

The above works fine, and I'm able to compute the accuracy of my baseline model.

However, when trying to run importances.score(test_data, test_labels) I'm getting the error mentioned above.

from datascope.importance.common import SklearnModelAccuracy
from datascope.importance.shapley import ShapleyImportance

net = reset_model(seed = 0)
pipeline = Pipeline([("model", net)])

utility = SklearnModelAccuracy(pipeline)
importance = ShapleyImportance(method="neighbor", utility=utility)
importances = importance.fit(train_data, train_labels)
importances.score(test_data, test_labels)

Here's the shape of my data:

train_data.shape, train_labels.shape
((2067, 3, 224, 224), (2067,))

test_data.shape, test_labels.shape
((813, 3, 224, 224), (813,))

Would be happy is someone could point me in the right direction! Not sure if this error is skorch related or the images are not supported yet? Thanks :)

@lowlypalace lowlypalace changed the title Skorch compatability Skorch compatibility Aug 7, 2022
@xzyaoi xzyaoi self-assigned this Aug 7, 2022
@xzyaoi
Copy link
Member

xzyaoi commented Aug 8, 2022

Hi, @lowlypalace thanks for reaching out!

I have encountered a similar problem before, and it is because of the shape of my data. Could you try reshaping your train_data/test_data into (N, D) where N=# samples (2067 for your train_data) and D is the dimension? For D, you probably need some preprocessing, e.g., flatten that makes your 2D images (I assume) into 1D vectors.

If that does not work out, please let me know, and I will take a closer look as soon as possible.

Best regards,
Xiaozhe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants