-
Notifications
You must be signed in to change notification settings - Fork 463
Feature/brier score #3270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Feature/brier score #3270
Conversation
The mean Brier score and it's decomposition should satisfy the following equation Brier = Uncertainty - Resolution + Reliability. After inspecting the results on a toy test case, the Uncertainty was estimated wrongly with a negative sign, also the Brier score formula was slightly wrong (f-1)^2=f^2-2f+1 not (f-1)^2=f^2-2f causing it to also be negative. With these fixes the decomposition equation is satisfied.
There were some more mistakes in estimating the Uncertainty. Also, the Confusion matrix had to be transposed such that the true labels are on the x axis.
for more information, see https://pre-commit.ci
|
||
|
||
class BinaryBrier(Metric): | ||
r"""Compute the `confusion matrix`_ for binary tasks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds strange...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear this is still early WIP. I could turn it into a draft(?)
I created #2196 a while back and decided to work a bit on it since people have commented repeatedly and asked for it. I've also updated the description of the PR to explain more the motivation behind the Brier score vs the ECE variants. @Borda @SkafteNicki @justusschock let me know if there is interest in this metric. I will then add tests, polish the existing code etc. I've seen that for other metrics the tests compare with reference implementations e.x. scikit-learn. After a brief search I didn't find any reference implementations from other libraries for the Brier decomposition. I will need some guidance on how to proceed with these. |
What does this PR do?
Fixes #2196
Implement Brier score and it's decomposition
I followed the original paper describing the decomposition of the Brier score into resolution, reliability and uncertainty
https://journals.ametsoc.org/view/journals/apme/12/4/1520-0450_1973_012_0595_anvpot_2_0_co_2.xml
and specifically the implementation found in
https://github.com/google-research/google-research/blob/master/uq_benchmark_2019/metrics_lib.py
and the paper
https://arxiv.org/abs/1906.02530
State of the PR
I added some rudimentary "tests" and the code seems to work for the Binary and Multiclass settings.
@SkafteNicki
Did you have fun?
Make sure you had fun coding 🙃
📚 Documentation preview 📚: https://torchmetrics--3270.org.readthedocs.build/en/3270/