Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when increasing the data size in MarginalImputer - Invalid probabilities #37

Open
NektariosKalampalikis opened this issue Dec 10, 2024 · 2 comments

Comments

@NektariosKalampalikis
Copy link

NektariosKalampalikis commented Dec 10, 2024

Thank you for your amazing work on SAGE.

While running the example code from the mnist.ipynb file, I encountered a ValueError: predictions are not valid probabilities when I provided more than 788 samples to the GroupedMarginalImputer. I did not alter any part of the provided code but simply increased the size of the input data.

Steps to Reproduce

# Setup and calculate
imputer = sage.GroupedMarginalImputer(model_activation, test_np[:1024], groups)  # Increased sample size
estimator = sage.PermutationEstimator(imputer, 'cross entropy')
sage_values = estimator(test_np, Y_test_np, batch_size=128, thresh=0.05)
  • Upon a quick inspection, I did not find anything unusual with the probability values produced by the model.
  • The error occurs specifically when the number of samples exceeds 788.
@iancovert
Copy link
Owner

Hi, thanks for checking out the package!

I looked through the code to remember where this ValueError is triggered, and it's here in a function called verify_model_data, which checks two things: 1) whether the probabilities are all between 0 and 1, and 2) whether they sum to 1. I can't see how this would be affected by the number of samples in GroupedMarginalImputer, that's quite surprising. Just to be sure, you're saying that you've run the mnist.ipynb file with no modifications other than the number of samples? You haven't changed anything about the model, for example?

If so, the next step in diagnosis would be the following. You can see how the PermutationEstimator class calls verify_model_data here: it simply calls X, Y = utils.verify_model_data(self.imputer, X, Y, self.loss_fn, batch_size). Perhaps you can step through the logic of verify_model_data and find which of the two conditions is violated, and why this depends on the number of samples in the imputer?

Let me know if this makes sense.

@NektariosKalampalikis
Copy link
Author

Hey, sorry for the late response.

The only other change I did to the code was switching the torch device from device = torch.device("cuda", 1) to device = torch.device("cuda") in the 4th block.

The aforementioned error happens when changing the Grouped importance in the 4x4 superpixels case. I first encountered the same error in one of my own project/model, and thought I would cross-check with your tutorials.

The error is raised here. From a quick check, seems like its a problem with np.close tolerance:

Rows with invalid probability sums (not equal to 1): [11 37] Invalid probability sums: [0.99998945 1.0000103 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants