Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stratified metrics #439

Closed
wants to merge 31 commits into from
Closed

Conversation

kavanase
Copy link
Contributor

Description

This PR implements stratified metrics for NequIP. The user can now set the stratify parameter in metrics_components, as either (1) X%_range in which case the errors are stratified by the range of reference values in increments of X% (i.e. give errors for first X% lowest reference values, next X% etc), (2) X%_population in which case the errors are stratified by the population of errors in increments of X% (i.e. give errors for first X% lowest errors, next X% etc) or (3) a float/int = X in which case the errors are stratified by the range of reference values of X (i.e. give errors for first [0,X) lowest reference values, [X,2X) etc) where X is in the units of the corresponding property.

Example config (added to full.yaml config):

# we can also output errors stratified by the reference value ranges (in percent or absolute values), or by the error populations in percent:
  - - total_energy
    - mae
    - stratify: 10%_range                 # stratify by range (in reference energies per atom), in increments of 10% (i.e. errors for first 10% lowest reference values, next 10% etc)
      PerAtom: True
  - - forces
    - rmse
    - stratify: 10%_population          # stratify by population (in forces errors per atom), in increments of 10%  (i.e. errors for first 10% lowest errors, next 10% etc)
  - - stress
    - mae
    - stratify: 0.001                  # stratify by absolute value (in reference stresses), in increments of 0.001

Also included is a small fix to prevent the find metrics output being printed twice to the terminal, which is currently the case.

Motivation and Context

Stratified error metrics can be quite useful in many cases; e.g. when you have a dataset with large energy/force ranges, RMSEs/MAEs can look pretty poor due to outliers / high energy cases, but this could be due to having high(er) errors for extremely high energy/force frames, but actually a very low error for low energies/forces near the minimum. This is particularly relevant in e.g. structure prediction applications, where high(er) errors for high energies/forces are acceptable (as long as they're roughly in the right direction), while low(er) errors are needed near the minima in order to distinguish different metastable structures etc.
e.g. see the discussion in Chris Pickard's perspective on MLPs for AIRSS (structure searching).

How Has This Been Tested?

I've added tests to match all current tests for metrics in the code. I've also manually tested most possible combinations of parameters (that I can think of) on RC Cannon, using both CPUs and GPUs, and confirmed the values being output are correct.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds or improves functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation improvement (updates to user guides, docstrings, or developer docs)

Checklist:

  • My code follows the code style of this project and has been formatted using black.
  • All new and existing tests passed, including on GPU (if relevant).
  • I have added tests that cover my changes (if relevant).
  • I have updated CHANGELOG.md.
  • I have updated the documentation (if relevant).

Also FYI, this PR is based on my other open PR which allows n_train/n_val to be specified as percentages, but this isn't required (was just the way I ended up branching).

@kavanase
Copy link
Contributor Author

Some quick example outputs:
minimal_metrics_test.yaml:

 # output metrics
 metrics_components:
   - - forces                               # key
     - mae                                  # "rmse" or "mae"
   - - forces
     - rmse
   - - forces
     - mae
     - stratify: 25%_range
   - - forces
     - mae
     - stratify: 0.005
   - - forces
     - rmse
     - stratify: 25%_population

Gives:

(pytorch_2.2.1) FasRC: Nequip_Y_Oxides_Seed_1_Batch_4 > nequip-evaluate --dataset-config nequip_y_oxides_bulk_supercells_test.yaml --train-dir results/Y_oxides/Y_oxides_tests --metrics-config minimal_metrics_test.yaml
Using device: cpu
Loading model...
/n/home03/skavanagh/Packages/nequip/nequip/utils/_global_options.py:59: UserWarning: !! Upstream issues in PyTorch versions >1.11 have been seen to cause unusual performance degredations on some CUDA systems that become worse over time; see https://github.com/mir-group/nequip/discussions/311. At present we *strongly* recommend the use of PyTorch 1.11 if using CUDA devices; while using other versions if you observe this problem, an unexpected lack of this problem, or other strange behavior, please post in the linked GitHub issue.
  warnings.warn(
    loaded model
Loading dataset...
Loaded dataset specified in nequip_y_oxides_bulk_supercells_test.yaml.
Using all frames from the specified test dataset, yielding a test set size of 12 frames.
Starting...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:22<00:00,  1.91s/it]
f_mae = 0.0062 | f_rmse = 0.0103 | 0%-25%_range_f_mae = 0.0070 | 25%-50%_range_f_mae = 0.0067 | 50%-75%_range_f_mae = 0.0055 | 75%-100%_range_f_mae = 0.0069 | 0.000-0.005_range_f_mae = 0.0081 | 0.005-0.010_range_f_mae = 0.0055 |

--- Final result: ---
Stratifying forces errors by forces range, in increments of 25% (= ~0.004), with min-max dataset range of 0.015
Stratifying forces errors by forces range, in increments of 0.005, with min-max dataset range of 0.015
Stratifying forces errors by population, in increments of 25% (= ~975 labels)
                         f_mae =  0.006192
                        f_rmse =  0.010321
            0%-25%_range_f_mae =  0.006962
           25%-50%_range_f_mae =  0.006692
           50%-75%_range_f_mae =  0.005464
          75%-100%_range_f_mae =  0.006867
       0.000-0.005_range_f_mae =  0.008050
       0.005-0.010_range_f_mae =  0.005475
       0.010-0.015_range_f_mae =  0.007944
      0%-25%_population_f_rmse =  0.000012
     25%-50%_population_f_rmse =  0.001347
     50%-75%_population_f_rmse =  0.006206
    75%-100%_population_f_rmse =  0.019640

(Not the best dataset in this case as it's only a handful of frames with very different energies and very low forces, but just for illustration)

minimal_metrics_test.yaml:

 # output metrics
 metrics_components:
   - - forces                               # key
     - mae                                  # "rmse" or "mae"
   - - forces
     - rmse
   - - total_energy
     - mae
     - stratify: 20%_range
   - - total_energy
     - mae
     - stratify: 40
   - - total_energy
     - mae
     - stratify: 30%_range
       PerAtom: True
   - - total_energy
     - mae
     - stratify: 20%_population
       PerAtom: True
   - - total_energy
     - mae
     - stratify: 40
       PerAtom: True
   - - stress
     - rmse
     - stratify: 25%_population
   - - stress
     - mae
     - stratify: 0.001

Gives:

(pytorch_2.2.1) FasRC: Nequip_Y_Oxides_Seed_1_Batch_4 > nequip-evaluate --dataset-config nequip_y_oxides_bulk_supercells_test.yaml --train-dir results/Y_oxides/Y_oxides_tests --metrics-config metrics_test_stratified.yaml
Using device: cpu
Loading model...
/n/home03/skavanagh/Packages/nequip/nequip/utils/_global_options.py:59: UserWarning: !! Upstream issues in PyTorch versions >1.11 have been seen to cause unusual performance degredations on some CUDA systems that become worse over time; see https://github.com/mir-group/nequip/discussions/311. At present we *strongly* recommend the use of PyTorch 1.11 if using CUDA devices; while using other versions if you observe this problem, an unexpected lack of this problem, or other strange behavior, please post in the linked GitHub issue.
  warnings.warn(
    loaded model
Loading dataset...
Loaded dataset specified in nequip_y_oxides_bulk_supercells_test.yaml.
Using all frames from the specified test dataset, yielding a test set size of 12 frames.
Starting...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:18<00:00,  1.52s/it]
f_mae = 0.0062 | f_rmse = 0.0103 | 0%-20%_range_e_mae = 1.5455 | 20%-40%_range_e_mae = 1.3810 | 40%-60%_range_e_mae = 0.5124 | 60%-80%_range_e_mae = 2.0148 | 80%-100%_range_e_mae = 1.1487 | 0-40_range_e_mae = 1.5455 | 40-80_range

--- Final result: ---
Stratifying total_energy errors by total_energy range, in increments of 20% (= ~93.432), with min-max dataset range of 467.160
Stratifying total_energy errors by total_energy range, in increments of 40, with min-max dataset range of 467.160
Stratifying total_energy errors by total_energy range, in increments of 30% (= ~140.148), with min-max dataset range of 467.160
Stratifying total_energy errors by population, in increments of 20% (= ~2 labels)
Stratifying total_energy errors by total_energy range, in increments of 40, with min-max dataset range of 467.160
Stratifying stress errors by population, in increments of 25% (= ~27 labels)
Stratifying stress errors by stress range, in increments of 0.001, with min-max dataset range of 0.034
                         f_mae =  0.006192
                        f_rmse =  0.010321
            0%-20%_range_e_mae =  1.545469
           20%-40%_range_e_mae =  1.380972
           40%-60%_range_e_mae =  0.512441
           60%-80%_range_e_mae =  2.014753
          80%-100%_range_e_mae =  1.148664
              0-40_range_e_mae =  1.545469
             40-80_range_e_mae =  nan
            80-120_range_e_mae =  1.474674
           120-160_range_e_mae =  1.975056
           160-200_range_e_mae =  0.805887
           200-240_range_e_mae =  nan
           240-280_range_e_mae =  0.681378
           280-320_range_e_mae =  nan
           320-360_range_e_mae =  nan
           360-400_range_e_mae =  0.965655
           400-440_range_e_mae =  1.410088
           440-480_range_e_mae =  2.302359
          0%-25%_range_e/N_mae =  0.013277
         25%-50%_range_e/N_mae =  0.006425
         50%-75%_range_e/N_mae =  0.011337
        75%-100%_range_e/N_mae =  0.018290
     0%-20%_population_e/N_mae =  0.001665
    20%-40%_population_e/N_mae =  0.004575
    40%-60%_population_e/N_mae =  0.011288
    60%-80%_population_e/N_mae =  0.014606
   80%-100%_population_e/N_mae =  0.020189
            0-40_range_e/N_mae =  0.012879
           40-80_range_e/N_mae =  nan
          80-120_range_e/N_mae =  0.011521
         120-160_range_e/N_mae =  0.015430
         160-200_range_e/N_mae =  0.006539
         200-240_range_e/N_mae =  nan
         240-280_range_e/N_mae =  0.006084
         280-320_range_e/N_mae =  nan
         320-360_range_e/N_mae =  nan
         360-400_range_e/N_mae =  0.011337
         400-440_range_e/N_mae =  0.016024
         440-480_range_e/N_mae =  0.020557
 0%-25%_population_stress_rmse =  0.000000
25%-50%_population_stress_rmse =  0.000000
50%-75%_population_stress_rmse =  0.000745
75%-100%_population_stress_rmse =  0.012377
  0.000-0.001_range_stress_mae =  0.000121
  0.001-0.002_range_stress_mae =  nan
  0.002-0.003_range_stress_mae =  nan
  0.003-0.004_range_stress_mae =  nan
  0.004-0.005_range_stress_mae =  nan
  0.005-0.006_range_stress_mae =  nan
  0.006-0.007_range_stress_mae =  nan
  0.007-0.008_range_stress_mae =  nan
  0.008-0.009_range_stress_mae =  nan
  0.009-0.010_range_stress_mae =  nan
  0.010-0.011_range_stress_mae =  0.000881
  0.011-0.012_range_stress_mae =  nan
  0.012-0.013_range_stress_mae =  nan
  0.013-0.014_range_stress_mae =  nan
  0.014-0.015_range_stress_mae =  nan
  0.015-0.016_range_stress_mae =  nan
  0.016-0.017_range_stress_mae =  0.000056
  0.017-0.018_range_stress_mae =  0.003473
  0.018-0.019_range_stress_mae =  nan
  0.019-0.020_range_stress_mae =  nan
  0.020-0.021_range_stress_mae =  nan
  0.021-0.022_range_stress_mae =  nan
  0.022-0.023_range_stress_mae =  0.011445
  0.023-0.024_range_stress_mae =  0.009092
  0.024-0.025_range_stress_mae =  0.014168
  0.025-0.026_range_stress_mae =  0.012878
  0.026-0.027_range_stress_mae =  0.009554
  0.027-0.028_range_stress_mae =  0.012844
  0.028-0.029_range_stress_mae =  0.009559
  0.029-0.030_range_stress_mae =  0.004105
  0.030-0.031_range_stress_mae =  0.004812
  0.031-0.032_range_stress_mae =  0.013789
  0.032-0.033_range_stress_mae =  nan
  0.033-0.034_range_stress_mae =  0.007820

@kavanase kavanase self-assigned this Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants