Stratified metrics #439

kavanase · 2024-06-26T19:12:35Z

Description

This PR implements stratified metrics for NequIP. The user can now set the stratify parameter in metrics_components, as either (1) X%_range in which case the errors are stratified by the range of reference values in increments of X% (i.e. give errors for first X% lowest reference values, next X% etc), (2) X%_population in which case the errors are stratified by the population of errors in increments of X% (i.e. give errors for first X% lowest errors, next X% etc) or (3) a float/int = X in which case the errors are stratified by the range of reference values of X (i.e. give errors for first [0,X) lowest reference values, [X,2X) etc) where X is in the units of the corresponding property.

Example config (added to full.yaml config):

# we can also output errors stratified by the reference value ranges (in percent or absolute values), or by the error populations in percent:
  - - total_energy
    - mae
    - stratify: 10%_range                 # stratify by range (in reference energies per atom), in increments of 10% (i.e. errors for first 10% lowest reference values, next 10% etc)
      PerAtom: True
  - - forces
    - rmse
    - stratify: 10%_population          # stratify by population (in forces errors per atom), in increments of 10%  (i.e. errors for first 10% lowest errors, next 10% etc)
  - - stress
    - mae
    - stratify: 0.001                  # stratify by absolute value (in reference stresses), in increments of 0.001

Also included is a small fix to prevent the find metrics output being printed twice to the terminal, which is currently the case.

Motivation and Context

Stratified error metrics can be quite useful in many cases; e.g. when you have a dataset with large energy/force ranges, RMSEs/MAEs can look pretty poor due to outliers / high energy cases, but this could be due to having high(er) errors for extremely high energy/force frames, but actually a very low error for low energies/forces near the minimum. This is particularly relevant in e.g. structure prediction applications, where high(er) errors for high energies/forces are acceptable (as long as they're roughly in the right direction), while low(er) errors are needed near the minima in order to distinguish different metastable structures etc.
e.g. see the discussion in Chris Pickard's perspective on MLPs for AIRSS (structure searching).

How Has This Been Tested?

I've added tests to match all current tests for metrics in the code. I've also manually tested most possible combinations of parameters (that I can think of) on RC Cannon, using both CPUs and GPUs, and confirmed the values being output are correct.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds or improves functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation improvement (updates to user guides, docstrings, or developer docs)

Checklist:

My code follows the code style of this project and has been formatted using black.
All new and existing tests passed, including on GPU (if relevant).
I have added tests that cover my changes (if relevant).
I have updated CHANGELOG.md.
I have updated the documentation (if relevant).

Also FYI, this PR is based on my other open PR which allows n_train/n_val to be specified as percentages, but this isn't required (was just the way I ended up branching).

…` config

kavanase · 2024-06-26T19:19:08Z

Some quick example outputs:
minimal_metrics_test.yaml:

 # output metrics
 metrics_components:
   - - forces                               # key
     - mae                                  # "rmse" or "mae"
   - - forces
     - rmse
   - - forces
     - mae
     - stratify: 25%_range
   - - forces
     - mae
     - stratify: 0.005
   - - forces
     - rmse
     - stratify: 25%_population

Gives:

(pytorch_2.2.1) FasRC: Nequip_Y_Oxides_Seed_1_Batch_4 > nequip-evaluate --dataset-config nequip_y_oxides_bulk_supercells_test.yaml --train-dir results/Y_oxides/Y_oxides_tests --metrics-config minimal_metrics_test.yaml
Using device: cpu
Loading model...
/n/home03/skavanagh/Packages/nequip/nequip/utils/_global_options.py:59: UserWarning: !! Upstream issues in PyTorch versions >1.11 have been seen to cause unusual performance degredations on some CUDA systems that become worse over time; see https://github.com/mir-group/nequip/discussions/311. At present we *strongly* recommend the use of PyTorch 1.11 if using CUDA devices; while using other versions if you observe this problem, an unexpected lack of this problem, or other strange behavior, please post in the linked GitHub issue.
  warnings.warn(
    loaded model
Loading dataset...
Loaded dataset specified in nequip_y_oxides_bulk_supercells_test.yaml.
Using all frames from the specified test dataset, yielding a test set size of 12 frames.
Starting...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:22<00:00,  1.91s/it]
f_mae = 0.0062 | f_rmse = 0.0103 | 0%-25%_range_f_mae = 0.0070 | 25%-50%_range_f_mae = 0.0067 | 50%-75%_range_f_mae = 0.0055 | 75%-100%_range_f_mae = 0.0069 | 0.000-0.005_range_f_mae = 0.0081 | 0.005-0.010_range_f_mae = 0.0055 |

--- Final result: ---
Stratifying forces errors by forces range, in increments of 25% (= ~0.004), with min-max dataset range of 0.015
Stratifying forces errors by forces range, in increments of 0.005, with min-max dataset range of 0.015
Stratifying forces errors by population, in increments of 25% (= ~975 labels)
                         f_mae =  0.006192
                        f_rmse =  0.010321
            0%-25%_range_f_mae =  0.006962
           25%-50%_range_f_mae =  0.006692
           50%-75%_range_f_mae =  0.005464
          75%-100%_range_f_mae =  0.006867
       0.000-0.005_range_f_mae =  0.008050
       0.005-0.010_range_f_mae =  0.005475
       0.010-0.015_range_f_mae =  0.007944
      0%-25%_population_f_rmse =  0.000012
     25%-50%_population_f_rmse =  0.001347
     50%-75%_population_f_rmse =  0.006206
    75%-100%_population_f_rmse =  0.019640

(Not the best dataset in this case as it's only a handful of frames with very different energies and very low forces, but just for illustration)

minimal_metrics_test.yaml:

 # output metrics
 metrics_components:
   - - forces                               # key
     - mae                                  # "rmse" or "mae"
   - - forces
     - rmse
   - - total_energy
     - mae
     - stratify: 20%_range
   - - total_energy
     - mae
     - stratify: 40
   - - total_energy
     - mae
     - stratify: 30%_range
       PerAtom: True
   - - total_energy
     - mae
     - stratify: 20%_population
       PerAtom: True
   - - total_energy
     - mae
     - stratify: 40
       PerAtom: True
   - - stress
     - rmse
     - stratify: 25%_population
   - - stress
     - mae
     - stratify: 0.001

Gives:

(pytorch_2.2.1) FasRC: Nequip_Y_Oxides_Seed_1_Batch_4 > nequip-evaluate --dataset-config nequip_y_oxides_bulk_supercells_test.yaml --train-dir results/Y_oxides/Y_oxides_tests --metrics-config metrics_test_stratified.yaml
Using device: cpu
Loading model...
/n/home03/skavanagh/Packages/nequip/nequip/utils/_global_options.py:59: UserWarning: !! Upstream issues in PyTorch versions >1.11 have been seen to cause unusual performance degredations on some CUDA systems that become worse over time; see https://github.com/mir-group/nequip/discussions/311. At present we *strongly* recommend the use of PyTorch 1.11 if using CUDA devices; while using other versions if you observe this problem, an unexpected lack of this problem, or other strange behavior, please post in the linked GitHub issue.
  warnings.warn(
    loaded model
Loading dataset...
Loaded dataset specified in nequip_y_oxides_bulk_supercells_test.yaml.
Using all frames from the specified test dataset, yielding a test set size of 12 frames.
Starting...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:18<00:00,  1.52s/it]
f_mae = 0.0062 | f_rmse = 0.0103 | 0%-20%_range_e_mae = 1.5455 | 20%-40%_range_e_mae = 1.3810 | 40%-60%_range_e_mae = 0.5124 | 60%-80%_range_e_mae = 2.0148 | 80%-100%_range_e_mae = 1.1487 | 0-40_range_e_mae = 1.5455 | 40-80_range

--- Final result: ---
Stratifying total_energy errors by total_energy range, in increments of 20% (= ~93.432), with min-max dataset range of 467.160
Stratifying total_energy errors by total_energy range, in increments of 40, with min-max dataset range of 467.160
Stratifying total_energy errors by total_energy range, in increments of 30% (= ~140.148), with min-max dataset range of 467.160
Stratifying total_energy errors by population, in increments of 20% (= ~2 labels)
Stratifying total_energy errors by total_energy range, in increments of 40, with min-max dataset range of 467.160
Stratifying stress errors by population, in increments of 25% (= ~27 labels)
Stratifying stress errors by stress range, in increments of 0.001, with min-max dataset range of 0.034
                         f_mae =  0.006192
                        f_rmse =  0.010321
            0%-20%_range_e_mae =  1.545469
           20%-40%_range_e_mae =  1.380972
           40%-60%_range_e_mae =  0.512441
           60%-80%_range_e_mae =  2.014753
          80%-100%_range_e_mae =  1.148664
              0-40_range_e_mae =  1.545469
             40-80_range_e_mae =  nan
            80-120_range_e_mae =  1.474674
           120-160_range_e_mae =  1.975056
           160-200_range_e_mae =  0.805887
           200-240_range_e_mae =  nan
           240-280_range_e_mae =  0.681378
           280-320_range_e_mae =  nan
           320-360_range_e_mae =  nan
           360-400_range_e_mae =  0.965655
           400-440_range_e_mae =  1.410088
           440-480_range_e_mae =  2.302359
          0%-25%_range_e/N_mae =  0.013277
         25%-50%_range_e/N_mae =  0.006425
         50%-75%_range_e/N_mae =  0.011337
        75%-100%_range_e/N_mae =  0.018290
     0%-20%_population_e/N_mae =  0.001665
    20%-40%_population_e/N_mae =  0.004575
    40%-60%_population_e/N_mae =  0.011288
    60%-80%_population_e/N_mae =  0.014606
   80%-100%_population_e/N_mae =  0.020189
            0-40_range_e/N_mae =  0.012879
           40-80_range_e/N_mae =  nan
          80-120_range_e/N_mae =  0.011521
         120-160_range_e/N_mae =  0.015430
         160-200_range_e/N_mae =  0.006539
         200-240_range_e/N_mae =  nan
         240-280_range_e/N_mae =  0.006084
         280-320_range_e/N_mae =  nan
         320-360_range_e/N_mae =  nan
         360-400_range_e/N_mae =  0.011337
         400-440_range_e/N_mae =  0.016024
         440-480_range_e/N_mae =  0.020557
 0%-25%_population_stress_rmse =  0.000000
25%-50%_population_stress_rmse =  0.000000
50%-75%_population_stress_rmse =  0.000745
75%-100%_population_stress_rmse =  0.012377
  0.000-0.001_range_stress_mae =  0.000121
  0.001-0.002_range_stress_mae =  nan
  0.002-0.003_range_stress_mae =  nan
  0.003-0.004_range_stress_mae =  nan
  0.004-0.005_range_stress_mae =  nan
  0.005-0.006_range_stress_mae =  nan
  0.006-0.007_range_stress_mae =  nan
  0.007-0.008_range_stress_mae =  nan
  0.008-0.009_range_stress_mae =  nan
  0.009-0.010_range_stress_mae =  nan
  0.010-0.011_range_stress_mae =  0.000881
  0.011-0.012_range_stress_mae =  nan
  0.012-0.013_range_stress_mae =  nan
  0.013-0.014_range_stress_mae =  nan
  0.014-0.015_range_stress_mae =  nan
  0.015-0.016_range_stress_mae =  nan
  0.016-0.017_range_stress_mae =  0.000056
  0.017-0.018_range_stress_mae =  0.003473
  0.018-0.019_range_stress_mae =  nan
  0.019-0.020_range_stress_mae =  nan
  0.020-0.021_range_stress_mae =  nan
  0.021-0.022_range_stress_mae =  nan
  0.022-0.023_range_stress_mae =  0.011445
  0.023-0.024_range_stress_mae =  0.009092
  0.024-0.025_range_stress_mae =  0.014168
  0.025-0.026_range_stress_mae =  0.012878
  0.026-0.027_range_stress_mae =  0.009554
  0.027-0.028_range_stress_mae =  0.012844
  0.028-0.029_range_stress_mae =  0.009559
  0.029-0.030_range_stress_mae =  0.004105
  0.030-0.031_range_stress_mae =  0.004812
  0.031-0.032_range_stress_mae =  0.013789
  0.032-0.033_range_stress_mae =  nan
  0.033-0.034_range_stress_mae =  0.007820

… greater variation within strata than overall mean)

… version warnings (e.g. when multiprocessing/multi-GPU etc)

kavanase and others added 12 commits April 12, 2024 15:56

Update .readthedocs.yaml

bcce773

Bump flake8 to avoid linting failure

1602704

Fix typo and reformat code to satisfy now-caught flake8 linting

a21178b

Merge branch 'mir-group:develop' into develop

a742ed5

Update requirements.txt

6589893

Initial attempt at stratified metrics

d61ea38

Fix evaluate metrics summary being printed twice

87f08d4

Add initial range-stratified functionality

9608a4d

Add population and raw unit stratification

069fd02

Update changelog and add examples of stratified metrics to `full.yaml…

6851fd7

…` config

Add tests for stratified metrics (and manually tested on HPCs)

9f3bd7c

Tidy up

e09311b

kavanase and others added 7 commits June 26, 2024 15:21

Adjust identity error tolerance for stratified energy tests (can have…

0170439

… greater variation within strata than overall mean)

Merge branch 'refs/heads/main' into stratified_metrics

0b5511f

Update lint.yaml to latest versions

761e980

Merge branch 'refs/heads/develop' into stratified_metrics

026250d

Merge branch 'refs/heads/main' into develop

472ef3e

Merge branch 'refs/heads/develop' into stratified_metrics

2ae5d7f

Merge branch 'mir-group:main' into stratified_metrics

1059248

kavanase self-assigned this Jul 12, 2024

kavanase added 9 commits July 12, 2024 15:55

Fix hard-set parameter value in plot_dimers.py

29ef5a4

plot_dimers.py script cleanup

2f8f852

Add CITATION.cff file

33beed1

Update README

a457e58

Fill out citation docs page

44f01c2

Bibtext syntax highlighting

9e915d9

Update citation file

54f5a8c

Update citation file pt 2

10f551a

Update citation file pt 3

ce8b8bd

kavanase added 3 commits July 12, 2024 17:44

Update CITATION.cff

c1a9d76

Allow user to set PYTORCH_VERSION_WARNING=0 to avoid many pytorch…

d1afaf2

… version warnings (e.g. when multiprocessing/multi-GPU etc)

Merge branch 'refs/heads/develop' into stratified_metrics

ea0f8fe

Linux-cpp-lisp closed this Jul 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stratified metrics #439

Stratified metrics #439

kavanase commented Jun 26, 2024

kavanase commented Jun 26, 2024

Stratified metrics #439

Stratified metrics #439

Conversation

kavanase commented Jun 26, 2024

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

kavanase commented Jun 26, 2024