Prediction Rigidities for Data-Driven Chemistry

This repository contains the inputs and scripts that were used to obtain the results shared in our contribution to the Faraday Discussions: "Data-driven discovery in the chemical sciences". Datasets and trained models are available on Materials Cloud, here.

The files are organized into the following directories:

`section_3`: PRs of NN models

This directory contains the NN model training inputs and example analysis scripts in the form of Jupyter notebooks for the results shared in Section 3 of our paper. Scripts for the acquisition and generation of training datasets are also made available.

For MACE, native implementation was used for training, but the analysis was performed with a version available here, which allows for the last-layer prediction rigidity (LLPR) analysis (see Bigi et al.).

For PaiNN, again, native implementation was used for training, but the analysis was performed with a version available here that contains the LLPR implementation.

For SOAP-BPNN, metatrain was used for training. Features for the LLPR analysis is actively being incorporated into main branch, but should all be available in the llpr branch.

`section_4a`: PR-guided dataset construction -- dataset augmentation

This directory contains the three Jupyter notebooks that were used in training the models for the analysis shown in Figure 4 of the manuscript. Models are being trained within the notebook using LE-ACE.

`section_4b`: PR-guided dataset construction -- active learning

This directory contains the Python files and shell scripts used to perform the analysis shown in Figure 5 of the manuscript. dia.xyz and MD_structure.xyz are each single structure files that were used during the analysis. The extraction and embedding strategies are implemented in extract.py, embed_FG.py, and embed_SC.py. Main analysis script is written in lpr_md_refined.py.

`section_5a`: Component-wise prediction rigidity -- body-ordered model

This directory contains the Julia script used to compute the ACE feature vectors (using ACE.jl) before and after "purification", implemented by Ho et al., as well as Jupyter notebooks used for the analysis in Section 5, Figure 6 of the manuscript.

Note that silicon cluster generation and reference energy calculations for the analysis were done with the same inputs and scripts provided for section_3.

`section_5b`: Component-wise prediction rigidity -- multi-range model

This directory contains the Jupyter notebook used to perform the analysis in Section 5, Figure 7 of the manuscript. rascaline was used to compute the SOAP and LODE features. Descriptor management was done with metatensor.

`section_6`: Application to coarse-grained water model

This directory contains (1) inputs for MACE model training, and the subsequent MD simulations of coarse-grained water with the trained models; (2) Jupyter notebooks used for the analysis results shown in Section 6 of the manuscript.

It is very important to note that the MACE model training was done using a custom MACE implementation that enforces all the nonlinear activation functions to be tanh, even the ones that cannot be controlled from the native implementation. This implementation can be accessed here). LLPR analysis can be done with the same implementation provided above for section_3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prediction Rigidities for Data-Driven Chemistry

`section_3`: PRs of NN models

`section_4a`: PR-guided dataset construction -- dataset augmentation

`section_4b`: PR-guided dataset construction -- active learning

`section_5a`: Component-wise prediction rigidity -- body-ordered model

`section_5b`: Component-wise prediction rigidity -- multi-range model

`section_6`: Application to coarse-grained water model

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
section_3		section_3
section_4a		section_4a
section_4b		section_4b
section_5a		section_5a
section_5b		section_5b
section_6		section_6
.gitignore		.gitignore
README.md		README.md

SanggyuChong/faraday_discussions_2024

Folders and files

Latest commit

History

Repository files navigation

Prediction Rigidities for Data-Driven Chemistry

section_3: PRs of NN models

section_4a: PR-guided dataset construction -- dataset augmentation

section_4b: PR-guided dataset construction -- active learning

section_5a: Component-wise prediction rigidity -- body-ordered model

section_5b: Component-wise prediction rigidity -- multi-range model

section_6: Application to coarse-grained water model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`section_3`: PRs of NN models

`section_4a`: PR-guided dataset construction -- dataset augmentation

`section_4b`: PR-guided dataset construction -- active learning

`section_5a`: Component-wise prediction rigidity -- body-ordered model

`section_5b`: Component-wise prediction rigidity -- multi-range model

`section_6`: Application to coarse-grained water model

Packages