-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stratified metrics #439
Closed
Closed
Stratified metrics #439
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some quick example outputs: # output metrics
metrics_components:
- - forces # key
- mae # "rmse" or "mae"
- - forces
- rmse
- - forces
- mae
- stratify: 25%_range
- - forces
- mae
- stratify: 0.005
- - forces
- rmse
- stratify: 25%_population Gives:
(Not the best dataset in this case as it's only a handful of frames with very different energies and very low forces, but just for illustration)
# output metrics
metrics_components:
- - forces # key
- mae # "rmse" or "mae"
- - forces
- rmse
- - total_energy
- mae
- stratify: 20%_range
- - total_energy
- mae
- stratify: 40
- - total_energy
- mae
- stratify: 30%_range
PerAtom: True
- - total_energy
- mae
- stratify: 20%_population
PerAtom: True
- - total_energy
- mae
- stratify: 40
PerAtom: True
- - stress
- rmse
- stratify: 25%_population
- - stress
- mae
- stratify: 0.001 Gives:
|
… greater variation within strata than overall mean)
… version warnings (e.g. when multiprocessing/multi-GPU etc)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR implements stratified metrics for NequIP. The user can now set the
stratify
parameter inmetrics_components
, as either (1)X%_range
in which case the errors are stratified by the range of reference values in increments of X% (i.e. give errors for first X% lowest reference values, next X% etc), (2)X%_population
in which case the errors are stratified by the population of errors in increments of X% (i.e. give errors for first X% lowest errors, next X% etc) or (3) afloat
/int
=X
in which case the errors are stratified by the range of reference values ofX
(i.e. give errors for first [0,X) lowest reference values, [X,2X) etc) whereX
is in the units of the corresponding property.Example config (added to
full.yaml
config):Also included is a small fix to prevent the find metrics output being printed twice to the terminal, which is currently the case.
Motivation and Context
Stratified error metrics can be quite useful in many cases; e.g. when you have a dataset with large energy/force ranges, RMSEs/MAEs can look pretty poor due to outliers / high energy cases, but this could be due to having high(er) errors for extremely high energy/force frames, but actually a very low error for low energies/forces near the minimum. This is particularly relevant in e.g. structure prediction applications, where high(er) errors for high energies/forces are acceptable (as long as they're roughly in the right direction), while low(er) errors are needed near the minima in order to distinguish different metastable structures etc.
e.g. see the discussion in Chris Pickard's perspective on MLPs for AIRSS (structure searching).
How Has This Been Tested?
I've added tests to match all current tests for metrics in the code. I've also manually tested most possible combinations of parameters (that I can think of) on RC Cannon, using both CPUs and GPUs, and confirmed the values being output are correct.
Types of changes
Checklist:
black
.CHANGELOG.md
.Also FYI, this PR is based on my other open PR which allows
n_train
/n_val
to be specified as percentages, but this isn't required (was just the way I ended up branching).