Skorch compatibility #7

lowlypalace · 2022-08-07T14:24:34Z

Hi, as far as I understand, Datascope is compatible with any scikit-learn pipeline. I'm using PyTorch and skorch (library that wraps PyTorch) to make my classifier scikit-learn compatible.

I'm currently getting the following error when trying to compute the score:

ValueError                                Traceback (most recent call last)
[<ipython-input-49-2e03ddd68d36>](https://localhost:8080/#) in <module>()
----> 1 importances.score(test_data, test_labels)

3 frames
[/usr/local/lib/python3.7/dist-packages/datascope-0.0.3-py3.7-linux-x86_64.egg/datascope/importance/importance.py](https://localhost:8080/#) in score(self, X, y, **kwargs)
     38         if isinstance(y, DataFrame):
     39             y = y.values
---> 40         return self._score(X, y, **kwargs)

[/usr/local/lib/python3.7/dist-packages/datascope-0.0.3-py3.7-linux-x86_64.egg/datascope/importance/shapley.py](https://localhost:8080/#) in _score(self, X, y, **kwargs)
    285         units = np.delete(units, np.where(units == -1))
    286         world = kwargs.get("world", np.zeros_like(units, dtype=int))
--> 287         return self._shapley(self.X, self.y, X, y, self.provenance, units, world)
    288 
    289     def _shapley(

[/usr/local/lib/python3.7/dist-packages/datascope-0.0.3-py3.7-linux-x86_64.egg/datascope/importance/shapley.py](https://localhost:8080/#) in _shapley(self, X, y, X_test, y_test, provenance, units, world)

    314             )
    315         elif self.method == ImportanceMethod.NEIGHBOR:
--> 316             return self._shapley_neighbor(X, y, X_test, y_test, provenance, units, world, self.nn_k, self.nn_distance)
    317         else:
    318             raise ValueError("Unknown method '%s'." % self.method)

[/usr/local/lib/python3.7/dist-packages/datascope-0.0.3-py3.7-linux-x86_64.egg/datascope/importance/shapley.py](https://localhost:8080/#) in _shapley_neighbor(self, X, y, X_test, y_test, provenance, units, world, k, distance)
    507             assert isinstance(X_test, spmatrix)
    508             X_test = X_test.todense()
--> 509         distances = distance(X, X_test)
    510 
    511         # Compute the utilitiy values between training and test labels.

sklearn/metrics/_dist_metrics.pyx in sklearn.metrics._dist_metrics.DistanceMetric.pairwise()

ValueError: Buffer has wrong number of dimensions (expected 2, got 4)

Here's a snippet of my code:

from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score

net = reset_model(seed = 0) # gives scikit-learn compatible skorch model

pipeline = Pipeline([("model", net)])

pipeline.fit(train_dataset, train_labels)
y_pred = pipeline.predict(test_dataset)

plot_loss(net)
accuracy_dirty = accuracy_score(y_pred, test_labels)
print("Pipeline accuracy in the beginning:", accuracy_dirty)

The above works fine, and I'm able to compute the accuracy of my baseline model.

However, when trying to run importances.score(test_data, test_labels) I'm getting the error mentioned above.

from datascope.importance.common import SklearnModelAccuracy
from datascope.importance.shapley import ShapleyImportance

net = reset_model(seed = 0)
pipeline = Pipeline([("model", net)])

utility = SklearnModelAccuracy(pipeline)
importance = ShapleyImportance(method="neighbor", utility=utility)
importances = importance.fit(train_data, train_labels)
importances.score(test_data, test_labels)

Here's the shape of my data:

train_data.shape, train_labels.shape
((2067, 3, 224, 224), (2067,))

test_data.shape, test_labels.shape
((813, 3, 224, 224), (813,))

Would be happy is someone could point me in the right direction! Not sure if this error is skorch related or the images are not supported yet? Thanks :)

The text was updated successfully, but these errors were encountered:

xzyaoi · 2022-08-08T09:11:59Z

Hi, @lowlypalace thanks for reaching out!

I have encountered a similar problem before, and it is because of the shape of my data. Could you try reshaping your train_data/test_data into (N, D) where N=# samples (2067 for your train_data) and D is the dimension? For D, you probably need some preprocessing, e.g., flatten that makes your 2D images (I assume) into 1D vectors.

If that does not work out, please let me know, and I will take a closer look as soon as possible.

Best regards,
Xiaozhe

lowlypalace changed the title ~~Skorch compatability~~ Skorch compatibility Aug 7, 2022

xzyaoi self-assigned this Aug 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skorch compatibility #7

Skorch compatibility #7

lowlypalace commented Aug 7, 2022 •

edited

Loading

xzyaoi commented Aug 8, 2022 •

edited

Loading

Skorch compatibility #7

Skorch compatibility #7

Comments

lowlypalace commented Aug 7, 2022 • edited Loading

xzyaoi commented Aug 8, 2022 • edited Loading

lowlypalace commented Aug 7, 2022 •

edited

Loading

xzyaoi commented Aug 8, 2022 •

edited

Loading