Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DeepMD DPA3 models #192

Merged
merged 31 commits into from
Jan 29, 2025
Merged

Add DeepMD DPA3 models #192

merged 31 commits into from
Jan 29, 2025

Conversation

anyangml
Copy link
Contributor

Hello Matbench Discovery Team,

First and foremost, I would like to express my sincere gratitude to you for your incredible efforts in building and maintaining the Matbench Discovery benchmark. Your work has provided an invaluable platform for the community to benchmark and advance machine learning models in materials science.

Our team has recently trained two conservative models within the DeePMD-kit framework (DPA3-1-MPtrj and DPA3-1-OpenLAM) that we believe could provide valuable insights and potentially improve the performance on the tasks you have outlined. We would love to add them to your benchmark.

I will add all the required files outlined in the contribution guide, please let me know if you need further information.

anyangml and others added 25 commits January 21, 2025 18:25

Verified

This commit was signed with the committer’s verified signature.
bonjourmauko Mauko Quiroga-Alvarado
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
@janosh
Copy link
Owner

janosh commented Jan 22, 2025

thanks for this PR! excited to check it out. just a heads up, i'm a bit behind on reviewing model submissions and likely won't get to this one until the weekend or next week

@janosh
Copy link
Owner

janosh commented Jan 26, 2025

@anyangml congrats on the nice results! 👍 could you upload the model-relaxed WBM structures somewhere and share the download link?

@anyangml
Copy link
Contributor Author

anyangml commented Jan 27, 2025

@anyangml congrats on the nice results! 👍 could you upload the model-relaxed WBM structures somewhere and share the download link?

Thanks. Are you referring to the json.gz files? The links are provided in the readme file.

@janosh
Copy link
Owner

janosh commented Jan 27, 2025

@anyangml yes, apologies i missed those

…dling

- simplify input file glob in main script
@janosh
Copy link
Owner

janosh commented Jan 29, 2025

@anyangml sorry, again for the delay. the PR is mostly ready to go.

could you share the k_srme.json.gz files for both models? they're small enough to check into version control. i didn't see a link to them in the readme

janosh and others added 2 commits January 28, 2025 20:39
- update model keys in DeepMD YAML files
- add wyckoff_spglib to MbdKey enum
- suggest similar labels in Model.from_label() if not found
- update scripts and tests to use new mace key
@anyangml
Copy link
Contributor Author

@anyangml sorry, again for the delay. the PR is mostly ready to go. again congrats on the nice results! 👍

could you share the k_srme.json.gz files for both models? they're small enough to check into version control. i didn't see a link to them in the readme

had to update gitignore to add these files.

- rename DeepMD-DPA3 so YAML model_key matches YAML filename
- remove kappa_SRME JSON files (bigger than expected. will be uploaded to figshare later)
- rename WBM final energy CSV files to end with discovery
- update data.py to new YAML file names
@janosh janosh changed the title Feat: add DPA3 Models Add DeepMD DPA3 models Jan 29, 2025
@janosh janosh merged commit cdfcc2a into janosh:main Jan 29, 2025
2 checks passed
janosh added a commit that referenced this pull request Jan 29, 2025
* feat: add dpa3 prediction results

* feat: add other required files

* feat: add ksrme

* fea: add to data.py to allow tables to be generated

* - test_dpa3.py + join_dpa3_preds.py add docstring and module path handling
- simplify input file glob in main script

* - fix oversight: rename mace to mace_mp_0 in Model enum
- update model keys in DeepMD YAML files
- add wyckoff_spglib to MbdKey enum
- suggest similar labels in Model.from_label() if not found
- update scripts and tests to use new mace key

* feat: add ksrme files

* update DeepMD-DPA3 model metadata and geo_opt metrics

- rename DeepMD-DPA3 so YAML model_key matches YAML filename
- remove kappa_SRME JSON files (bigger than expected. will be uploaded to figshare later)
- rename WBM final energy CSV files to end with discovery
- update data.py to new YAML file names

* install pymatviz from main branch in CI

---------

Co-authored-by: Janosh Riebesell <[email protected]>
Co-authored-by: Rhys Goodall <[email protected]>
@CompRhys
Copy link
Collaborator

CompRhys commented Feb 6, 2025

@anyangml in the description here you've listed training both on sAlex and Alex which are overlapping. The readme list suggests that you only trained on Alex2D not Alex or sAlex. If the readme is complete we can just add up the numbers listed and define an OpenLAM dataset now to avoid confusion and double counting of Alexandria?

@anyangml
Copy link
Contributor Author

anyangml commented Feb 6, 2025

@anyangml in the description here you've listed training both on sAlex and Alex which are overlapping. The readme list suggests that you only trained on Alex2D not Alex or sAlex. If the readme is complete we can just add up the numbers listed and define an OpenLAM dataset now to avoid confusion and double counting of Alexandria?

for the dpa3-openlam model, Alex2D is one of the pre-training datasets (along with all other datasets listed in the table), and we used mptrj + sAlex to finetune the model. The original Alex3D was not used. Yes, I intended to define an OpenLAM dataset. We are working on releasing the training datasets.

@anyangml
Copy link
Contributor Author

here is a link to the dataset card https://aissquare.com/datasets/detail?pageType=datasets&name=LAMBench-TrainingSet-v1&id=308. We will release all the training data soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants