Replies: 1 comment 2 replies
-
Hi @beidouamg , Glad to hear! Regarding multi-GPU training, this is not yet fully supported. You can see discussion of this here: #210. The dataset preprocessing should, however, be parallelized over CPU cores: you can control this manually with the At present, Regarding a very large dataset, you can pre-pre-process your data on a CPU node by running |
Beta Was this translation helpful? Give feedback.
-
Hi NequIP developers,
Thanks for developing this code, it helps a lot and gives better machine learning potential than others.
I have some troubles with training (maybe it will turn out to be a noob question), the essential problem is nequip-train use always only one of GPUs in the requested GPU nodes (one GPU node has 4 GPUs). Some details are below:
#!/bin/bash #SBATCH --constraint=gpu #SBATCH --exclusive #SBATCH --qos=debug #SBATCH --nodes=2 #SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-task=32 #SBATCH --gpus=8 #SBATCH --time=00:30:00 export OMP_PLACES=threads export OMP_PROC_BIND=spread export SLURM_CPU_BIND="cores" #export NEQUIP_NUM_TASKS=8 module load cudatoolkit/11.7 conda activate nequip nequip-train full.yaml
I have tried to change setting in the jobscript, but it doesn't work. It would be great if you can help to figure out the causes to make full use of available CPU/GPU in the node. Please let me know if you need more info.
Thanks for your time on this problem!
Beta Was this translation helpful? Give feedback.
All reactions