Error when increasing the data size in MarginalImputer - Invalid probabilities #37

NektariosKalampalikis · 2024-12-10T22:24:45Z

Thank you for your amazing work on SAGE.

While running the example code from the mnist.ipynb file, I encountered a ValueError: predictions are not valid probabilities when I provided more than 788 samples to the GroupedMarginalImputer. I did not alter any part of the provided code but simply increased the size of the input data.

Steps to Reproduce

# Setup and calculate
imputer = sage.GroupedMarginalImputer(model_activation, test_np[:1024], groups)  # Increased sample size
estimator = sage.PermutationEstimator(imputer, 'cross entropy')
sage_values = estimator(test_np, Y_test_np, batch_size=128, thresh=0.05)

Upon a quick inspection, I did not find anything unusual with the probability values produced by the model.
The error occurs specifically when the number of samples exceeds 788.

The text was updated successfully, but these errors were encountered:

iancovert · 2024-12-17T23:43:12Z

Hi, thanks for checking out the package!

I looked through the code to remember where this ValueError is triggered, and it's here in a function called verify_model_data, which checks two things: 1) whether the probabilities are all between 0 and 1, and 2) whether they sum to 1. I can't see how this would be affected by the number of samples in GroupedMarginalImputer, that's quite surprising. Just to be sure, you're saying that you've run the mnist.ipynb file with no modifications other than the number of samples? You haven't changed anything about the model, for example?

If so, the next step in diagnosis would be the following. You can see how the PermutationEstimator class calls verify_model_data here: it simply calls X, Y = utils.verify_model_data(self.imputer, X, Y, self.loss_fn, batch_size). Perhaps you can step through the logic of verify_model_data and find which of the two conditions is violated, and why this depends on the number of samples in the imputer?

Let me know if this makes sense.

NektariosKalampalikis · 2024-12-23T23:13:19Z

Hey, sorry for the late response.

The only other change I did to the code was switching the torch device from device = torch.device("cuda", 1) to device = torch.device("cuda") in the 4th block.

The aforementioned error happens when changing the Grouped importance in the 4x4 superpixels case. I first encountered the same error in one of my own project/model, and thought I would cross-check with your tutorials.

The error is raised here. From a quick check, seems like its a problem with np.close tolerance:

Rows with invalid probability sums (not equal to 1): [11 37] Invalid probability sums: [0.99998945 1.0000103 ]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when increasing the data size in MarginalImputer - Invalid probabilities #37

Error when increasing the data size in MarginalImputer - Invalid probabilities #37

NektariosKalampalikis commented Dec 10, 2024 •

edited

Loading

iancovert commented Dec 17, 2024

NektariosKalampalikis commented Dec 23, 2024

Error when increasing the data size in MarginalImputer - Invalid probabilities #37

Error when increasing the data size in MarginalImputer - Invalid probabilities #37

Comments

NektariosKalampalikis commented Dec 10, 2024 • edited Loading

iancovert commented Dec 17, 2024

NektariosKalampalikis commented Dec 23, 2024

NektariosKalampalikis commented Dec 10, 2024 •

edited

Loading