Some questions about dpgen #1672

21al07se09t · 2024-11-16T05:27:16Z

21al07se09t
Nov 16, 2024

Dear all,

Recently, I have been learning how to use dpgen. I'm studying the input files in the PRL paper [PHYSICAL REVIEW LETTERS 126, 236001 (2021)], which I downloaded from the website: https://www.aissquare.com/models/detail?name=H2O-Phase-Diagram-model&id=25&pageType=models. However, I have encountered some confusion and would like to seek advice from everyone.

How important is the dataset ("init_data_sys" in param.json) used for training the initial model?
I checked the file data/data.init/init_system.000/set.000/energy.npy by the command np.load(energy.npy). The array contains 500 identical numbers : 516.83954. I think this means the 500 extracted configurations have the same energy, and they will serve as the dataset for trainning the initial model. Does this suggest that the initial dataset is not important? and I only need to give a strict criterion for convergence during the "fp" step?
For example, I found the energy of system will be converged by using the ENCUT of 1000, can I prepare the data for init_data_sys by extracting serveral snapshots from AIMD simulations with ENCUT of 400 and set the ENCUT for "fp" step to 1000?
How to select the initial configurations for the model_devi ("sys_configs" in param.json)?
I checked the file param.json provided by the PRL paper and found that only configurations of ice0X/0000[3-9] were included. It seems that the configurations were generated by dpgen init_bulk, why the configurations of ice0X/0000[1-2] are neglected?

or the authors just randomly chose cofigurations for each group of model_devi?
Is there any size requirement for the "sys_configs" in param.json?
If I understand correctly, structures with the same size as "sys_configs" will undergo a single-point energy calculation during the "fp" step, which may take a long time if I use a large cell for "sys_configs". However, the unreasonable self-interactions may be included in the DPMD simulations if I use a small cell for "sys_configs" . So does this mean that I should use a sufficiently large cell to avoid the self-interactions during model_devi?
How to give different labeling criteria during the training procedure?
In supproting information of the PRL paper, the authors mentioned " The levels are set to 0.18 and 0.32 eV/˚ A in iterations 25 to 32, and to 0.20 and 0.35 eV/˚ A in iterations 33 to 36. Different convergence criteria are used at low (T ≤ 800 K) and high temperature (T > 800 K) to account for the different relative importance of thermal fluctuations." How is this implemented in the param.json file？

Any advice would be greatly appreciated！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about dpgen #1672

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Some questions about dpgen #1672

21al07se09t Nov 16, 2024

Replies: 0 comments

21al07se09t
Nov 16, 2024