Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mc_calibration plot #294

Merged
merged 1 commit into from
Jan 28, 2025
Merged

Conversation

Kucharssim
Copy link
Collaborator

The mc_calibration plot did not work. In addition to the obvious problems (wrong arg names etc), the prepare_plot_data() expected pred_models.shape == (num_datasets, num_samples, num_models) (where in this case num_samples would be 1), whereas expected_calibration_error() expected pred_models.shape == (num_datasets, num_models). I loosened the check in prepare_plot_data such that targets can be either posterior samples (num_datasets, num_samples, num_variables) or point estimates (num_datasets, num_variables).

I would be happy to write a test for this but I am not sure what is the desired standard for testing plotting functionality in this project. Please let me know.

You can test the PR with:

import keras
import bayesflow as bf

num_datasets=1000
num_models=3

true_models = keras.random.categorical(logits=keras.ops.zeros((num_datasets, num_models)), num_samples=1)
true_models = keras.ops.one_hot(true_models, num_classes=num_models)
true_models = keras.ops.reshape(true_models, (num_datasets, num_models))
true_models = keras.ops.convert_to_numpy(true_models)

pred_models=keras.random.normal(shape=(num_datasets, 3), mean=true_models)
pred_models=keras.ops.softmax(pred_models)
pred_models=keras.ops.convert_to_numpy(pred_models)

f=bf.diagnostics.mc_calibration(pred_models=pred_models, true_models=true_models)

@paul-buerkner
Copy link
Contributor

Thanks! I have not so much experience with the diagnostics module myself unfortunately. @stefanradev93 could you take a quick look?

@paul-buerkner paul-buerkner removed their request for review January 24, 2025 15:33
@stefanradev93
Copy link
Contributor

Hi Simon, thanks for taking this on! Indeed, the model comparison components have been somewhat neglected recently, so I am happy you noticed these bugs. Testing plots is pretty hard and would amount to testing the fidelity of the underlying data - so for now we just rely on using them across a variety of applications. Please, fix the linter problems by including the pre-commit hook for auto-checking and fixing.

@Kucharssim
Copy link
Collaborator Author

Thanks @stefanradev93!

Regarding the linter, I've already ran the pre-commits hooks before making the PR, and running ruff locally passes all checks. The log of the GitHub action here shows that the problematic files are those that I didn't touch, any idea what's happening here?

@stefanradev93
Copy link
Contributor

That is pretty strange indeed. @LarsKue have you seen this before?

@LarsKue
Copy link
Contributor

LarsKue commented Jan 28, 2025

I think it's a non-issue since the formatter and linter also check files that are not changed. If the target branch breaks them, so will the PR. We should fix this directly on dev.

Please merge dev into this branch before merging back into dev. Otherwise, I am ok with merging.

@Kucharssim Kucharssim force-pushed the minor-fix-diagnostics branch from 608fdd3 to 123ce33 Compare January 28, 2025 09:19
@Kucharssim
Copy link
Collaborator Author

Please merge dev into this branch before merging back into dev.

Ok, done. Now one of the tests failed on windows 😄 I think that's unrelated too though.

@LarsKue LarsKue merged commit d990340 into bayesflow-org:dev Jan 28, 2025
12 of 13 checks passed
han-ol pushed a commit to han-ol/bayesflow that referenced this pull request Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants