Skip to content

Repository containing the codes and scripts used to obtain the results of our contributions for the Faraday Discussions: "Data-drinve discovery in the chemical sciences"

Notifications You must be signed in to change notification settings

SanggyuChong/faraday_discussions_2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prediction Rigidities for Data-Driven Chemistry

This repository contains the inputs and scripts that were used to obtain the results shared in our contribution to the Faraday Discussions: "Data-driven discovery in the chemical sciences". Datasets and trained models are available on Materials Cloud, here.

The files are organized into the following directories:

section_3: PRs of NN models

This directory contains the NN model training inputs and example analysis scripts in the form of Jupyter notebooks for the results shared in Section 3 of our paper. Scripts for the acquisition and generation of training datasets are also made available.

For MACE, native implementation was used for training, but the analysis was performed with a version available here, which allows for the last-layer prediction rigidity (LLPR) analysis (see Bigi et al.).

For PaiNN, again, native implementation was used for training, but the analysis was performed with a version available here that contains the LLPR implementation.

For SOAP-BPNN, metatrain was used for training. Features for the LLPR analysis is actively being incorporated into main branch, but should all be available in the llpr branch.

section_4a: PR-guided dataset construction -- dataset augmentation

This directory contains the three Jupyter notebooks that were used in training the models for the analysis shown in Figure 4 of the manuscript. Models are being trained within the notebook using LE-ACE.

section_4b: PR-guided dataset construction -- active learning

This directory contains the Python files and shell scripts used to perform the analysis shown in Figure 5 of the manuscript. dia.xyz and MD_structure.xyz are each single structure files that were used during the analysis. The extraction and embedding strategies are implemented in extract.py, embed_FG.py, and embed_SC.py. Main analysis script is written in lpr_md_refined.py.

section_5a: Component-wise prediction rigidity -- body-ordered model

This directory contains the Julia script used to compute the ACE feature vectors (using ACE.jl) before and after "purification", implemented by Ho et al., as well as Jupyter notebooks used for the analysis in Section 5, Figure 6 of the manuscript.

Note that silicon cluster generation and reference energy calculations for the analysis were done with the same inputs and scripts provided for section_3.

section_5b: Component-wise prediction rigidity -- multi-range model

This directory contains the Jupyter notebook used to perform the analysis in Section 5, Figure 7 of the manuscript. rascaline was used to compute the SOAP and LODE features. Descriptor management was done with metatensor.

section_6: Application to coarse-grained water model

This directory contains (1) inputs for MACE model training, and the subsequent MD simulations of coarse-grained water with the trained models; (2) Jupyter notebooks used for the analysis results shown in Section 6 of the manuscript.

It is very important to note that the MACE model training was done using a custom MACE implementation that enforces all the nonlinear activation functions to be tanh, even the ones that cannot be controlled from the native implementation. This implementation can be accessed here). LLPR analysis can be done with the same implementation provided above for section_3.

About

Repository containing the codes and scripts used to obtain the results of our contributions for the Faraday Discussions: "Data-drinve discovery in the chemical sciences"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published