Multihead replay finetuning converges more slowly than regular training #626

lucasdekam · 2024-10-08T10:09:23Z

lucasdekam
Oct 8, 2024

Hi, I'd like to share my experience so far with multihead finetuning and ask for ideas.

I'm training MACE on a dataset with ~80 platinum-water interface structures of ~400 atoms each; energy and forces are evaluated using VASP with the RPBE functional. I've tried two methods: multihead replay finetuning (starting from the mace-mp0b agnesi small model) and naive finetuning (starting from the standard small model). The training parameters can be found below. The multihead finetuning converges much more slowly and the model struggles to converge the errors on the replayed data (pt_head) again. I've not been able to get the forces error much lower than 100 meV/A, although perhaps this could be achieved by longer training.

2024-10-06 18:06:17.060 INFO: Epoch 99: head: pt_head, loss=  0.0016, RMSE_E_per_atom=   639.6 meV, RMSE_F=   729.4 meV / A, RMSE_stress=    40.4 meV / A^3
2024-10-06 18:06:17.189 INFO: Epoch 99: head: default, loss=  0.0026, RMSE_E_per_atom=     1.2 meV, RMSE_F=   119.0 meV / A, RMSE_stress=    10.9 meV / A^3

Because I wanted to converge the forces faster, I increased the forces weight by a factor 10, which gave this result. With equal weights the forces converge even more slowly.

On the other hand, naive finetuning converges pretty fast to a a rather low force RMSE. For me, the resulting model also seems very stable, so there's no "catastrophic forgetting" (at least not that I've noticed).

2024-10-06 18:41:18.014 INFO: Epoch 49: head: default, loss=  0.0009, RMSE_E_per_atom=     0.3 meV, RMSE_F=    51.1 meV / A

I'm still interested in using the multiheads training, as it might improve the generalizability of my model. My question: what could cause the slow convergence of multiheads training? Is this already known? What parameters can one tune to achieve better convergence (should I increase the forces weight even more, etc.)? I'd be happy to hear about any insights :)

Training parameters:

Multihead

mace_run_train \
    --name="MACE" \
    --foundation_model="../mace_agnesi_small.model" \
    --multiheads_finetuning=True \
    --train_file="../train.xyz" \
    --valid_fraction=0.05 \
    --test_file="../test.xyz" \
    --energy_weight=1.0 \
    --forces_weight=10.0 \
    --energy_key='DFT_energy' \
    --forces_key='DFT_forces' \
    --E0s="{1: -1.20502718, 8: -1.60386686, 78: -0.5578757}" \
    --lr=0.01 \
    --scaling="rms_forces_scaling" \
    --batch_size=3 \
    --max_num_epochs=100 \
    --ema \
    --ema_decay=0.99 \
    --amsgrad \
    --default_dtype="float64" \
    --device=cuda \
    --seed=3

Naive

mace_run_train \
    --name="MACE" \
    --foundation_model="small" \
    --multiheads_finetuning=False \
    --train_file="../train.xyz" \
    --valid_fraction=0.05 \
    --test_file="../test.xyz" \
    --energy_weight=1.0 \
    --forces_weight=1.0 \
    --energy_key='DFT_energy' \
    --forces_key='DFT_forces' \
    --E0s="{1: -1.20502718, 8: -1.60386686, 78: -0.5578757}" \
    --lr=0.01 \
    --scaling="rms_forces_scaling" \
    --batch_size=2 \
    --max_num_epochs=50 \
    --ema \
    --ema_decay=0.99 \
    --amsgrad \
    --default_dtype="float64" \
    --device=cuda \
    --seed=1

ilyes319 · 2024-10-08T15:43:07Z

ilyes319
Oct 8, 2024
Maintainer

Hello,

Can you please share the log files for the two training so I can help you. I need for example to look at the initial loss to see if there is a potential problem.
The multihead replay requires precise computation of the E0s, and that is usually the reason for problems.
How did you compute the E0s for your DFT. Did you make sure that the oxygen E0s are spin polarized, it is very important.

13 replies

lucasdekam Oct 11, 2024
Author

Can do, any preference for the forces weight? Should I leave it at 10 or revert to 1?

ilyes319 Oct 11, 2024
Maintainer

I think use 10 forces and 1 energies, that is matching what we use for MP.

lucasdekam Oct 15, 2024
Author

Here's the result, with this training command:

mace_run_train \
    --name="MACE" \
    --foundation_model="../mace_agnesi_small.model" \
    --multiheads_finetuning=True \
    --train_file="../train.xyz" \
    --valid_fraction=0.1 \
    --test_file="../test.xyz" \
    --energy_weight=1.0 \
    --forces_weight=10.0 \
    --stress_weight=0 \
    --virials_weight=0 \
    --energy_key='DFT_energy' \
    --forces_key='DFT_forces' \
    --stress_key=None \
    --virials_key=None \
    --compute_stress=False \
    --E0s="{1: -1.20502718, 8: -1.60386686, 78: -0.5578757}" \
    --lr=0.01 \
    --scaling="rms_forces_scaling" \
    --batch_size=3 \
    --max_num_epochs=50 \
    --ema \
    --ema_decay=0.99 \
    --amsgrad \
    --default_dtype="float64" \
    --device=cuda \
    --seed=3

where only the seed was changed.
MACE_run-2.log
MACE_run-3.log
MACE_run-1.log

ilyes319 Oct 15, 2024
Maintainer

Hey @lucasdekam, thank you for the log. We do see a large variation with the seed. I will have a more proper look at it. To be more fair, it would be better if you create a single validation file, for all the seeds. Here, the variation could come the pretraining head but also the data in the validation set.

lucasdekam Oct 16, 2024
Author

multiheads-0_run-0.log
multiheads-1_run-1.log
multiheads-2_run-2.log

I've done a quick run with a fixed validation set file. Hope that helps. Let me know if you find anything.

NAME="multiheads-$SLURM_ARRAY_TASK_ID"

mace_run_train \
    --name=$NAME \
    --foundation_model="mace_agnesi_small.model" \
    --multiheads_finetuning=True \
    --train_file="train_multiheads.xyz" \
    --valid_file="valid_multiheads.xyz" \
    --test_file="test.xyz" \
    --energy_weight=1.0 \
    --forces_weight=10.0 \
    --stress_weight=0 \
    --energy_key='DFT_energy' \
    --forces_key='DFT_forces' \
    --E0s="{1: -1.20502718, 8: -1.60386686, 78: -0.5578757}" \
    --lr=0.01 \
    --scaling="rms_forces_scaling" \
    --batch_size=3 \
    --max_num_epochs=50 \
    --ema \
    --ema_decay=0.99 \
    --amsgrad \
    --default_dtype="float64" \
    --device=cuda \
    --seed=$SLURM_ARRAY_TASK_ID \
    --work_dir=$NAME

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multihead replay finetuning converges more slowly than regular training #626

{{title}}

Replies: 1 comment 13 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Multihead replay finetuning converges more slowly than regular training #626

lucasdekam Oct 8, 2024

Training parameters:

Replies: 1 comment · 13 replies

ilyes319 Oct 8, 2024 Maintainer

lucasdekam Oct 11, 2024 Author

ilyes319 Oct 11, 2024 Maintainer

lucasdekam Oct 15, 2024 Author

ilyes319 Oct 15, 2024 Maintainer

lucasdekam Oct 16, 2024 Author

lucasdekam
Oct 8, 2024

Replies: 1 comment 13 replies

ilyes319
Oct 8, 2024
Maintainer

lucasdekam Oct 11, 2024
Author

ilyes319 Oct 11, 2024
Maintainer

lucasdekam Oct 15, 2024
Author

ilyes319 Oct 15, 2024
Maintainer

lucasdekam Oct 16, 2024
Author