some errors in call "dpgen run pa... ma..." #394
Replies: 2 comments 5 replies
-
I am also having the same issue:
Would be extremely grateful if the developers can provide some help on this. |
Beta Was this translation helpful? Give feedback.
-
You need to go to the working directory, i.e. @AnguseZhang DP-GEN should give a better message for this error. |
Beta Was this translation helpful? Give feedback.
-
Dear developers,
When I call "dpgen run pa... ma...",it have something wrong about call"lmp_mpi",the error is as following:(it concludes 10 initial configurations).
*** Error in
/public3/home/sc31148/a/lammps/src/lmp_mpi': double free or corruption (!prev): 0x00000000031eb0e0 *** INFO:dpgen:job 6ec96f6e-e2f0-4832-8e5e-2d6db31ba8a0 terminated, submit again INFO:dpgen:job e6f52053-7806-4467-808d-dd28b357af50 terminated, submit again INFO:dpgen:job 5253a482-fc5a-434f-a0ab-132921c4f5c3 terminated, submit again INFO:dpgen:job 8c3a9990-fb1e-473e-9851-7a07dc6e50e0 terminated, submit again INFO:dpgen:job f18139e0-f8ca-4edd-8ce4-30b2b4b1c920 terminated, submit again *** Error in
/public3/home/sc31148/a/lammps/src/lmp_mpi': double free or corruption (fasttop): 0x00000000028f9870 ***INFO:dpgen:job 2c277044-bc55-4ad5-abf3-e1a0475d15ea terminated, submit again
INFO:dpgen:job d55941a2-8ca3-46fd-b043-e53c6d4aafee terminated, submit again
INFO:dpgen:job e0bc2f24-0589-4350-b96e-c057e14161cb terminated, submit again
INFO:dpgen:job de00f61b-2311-4606-b5fa-cff57cd1e53d terminated, submit again
INFO:dpgen:job 6ec96f6e-e2f0-4832-8e5e-2d6db31ba8a0 terminated, submit again
INFO:dpgen:job e6f52053-7806-4467-808d-dd28b357af50 terminated, submit again
INFO:dpgen:job d4a20368-ffda-490c-ae14-d75122d50bcb terminated, submit again
*** Error in
/public3/home/sc31148/a/lammps/src/lmp_mpi': double free or corruption (fasttop): 0x00000000027ae850 *** INFO:dpgen:job f18139e0-f8ca-4edd-8ce4-30b2b4b1c920 terminated, submit again INFO:dpgen:job 5253a482-fc5a-434f-a0ab-132921c4f5c3 terminated, submit again INFO:dpgen:job 2c277044-bc55-4ad5-abf3-e1a0475d15ea terminated, submit again INFO:dpgen:job 8c3a9990-fb1e-473e-9851-7a07dc6e50e0 terminated, submit again INFO:dpgen:job de00f61b-2311-4606-b5fa-cff57cd1e53d terminated, submit again INFO:dpgen:job 6ec96f6e-e2f0-4832-8e5e-2d6db31ba8a0 terminated, submit again INFO:dpgen:job d55941a2-8ca3-46fd-b043-e53c6d4aafee terminated, submit again INFO:dpgen:job e0bc2f24-0589-4350-b96e-c057e14161cb terminated, submit again *** Error in
/public3/home/sc31148/a/lammps/src/lmp_mpi': double free or corruption (!prev): 0x00000000021507f0 ****** Error in
/public3/home/sc31148/a/lammps/src/lmp_mpi': double free or corruption (!prev): 0x00000000017fc7f0 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x81299)[0x2b21ffbf1299] /lib64/libc.so.6(+0x39d10)[0x2b21ffba9d10] /lib64/libc.so.6(+0x39d37)[0x2b21ffba9d37] /public1/soft/gcc/4.9.2//lib64/libgomp.so.1(+0x6dc6)[0x2b21fff44dc6] /public1/soft/gcc/4.9.2//lib64/libgomp.so.1(+0xeaff)[0x2b21fff4caff] /public1/soft/gcc/4.9.2//lib64/libgomp.so.1(GOMP_parallel+0x3a)[0x2b21fff478ba] /public1/home/sc31148/a/deepmd_root/lib/libdeepmd_op.so(_ZN15ProdVirialSeAOp7ComputeEPN10tensorflow15OpKernelContextE+0x8ab)[0x2b21f145fcfb] INFO:dpgen:job e6f52053-7806-4467-808d-dd28b357af50 terminated, submit again INFO:dpgen:job f18139e0-f8ca-4edd-8ce4-30b2b4b1c920 terminated, submit again INFO:dpgen:job 5253a482-fc5a-434f-a0ab-132921c4f5c3 terminated, submit again INFO:dpgen:job d4a20368-ffda-490c-ae14-d75122d50bcb terminated, submit again *** Error in
/public3/home/sc31148/a/lammps/src/lmp_mpi': double free or corruption (!prev): 0x00000000020fd4c0 ***======= Backtrace: =========
/lib64/libc.so.6(+0x81299)[0x2ac6adac4299]
/lib64/libc.so.6(+0x39ce9)[0x2ac6ada7cce9]
/lib64/libc.so.6(+0x39d37)[0x2ac6ada7cd37]
/public1/soft/gcc/4.9.2//lib64/libgomp.so.1(+0x6dc6)[0x2ac6ade17dc6]
/public1/soft/gcc/4.9.2//lib64/libgomp.so.1(+0xeaff)[0x2ac6ade1faff]
/public1/soft/gcc/4.9.2//lib64/libgomp.so.1(GOMP_parallel+0x3a)[0x2ac6ade1a8ba]
/public1/home/sc31148/a/deepmd_root/lib/libdeepmd_op.so(_ZN15ProdVirialSeAOp7ComputeEPN10tensorflow15OpKernelContextE+0x8ab)[0x2ac69f332cfb]
INFO:dpgen:job de00f61b-2311-4606-b5fa-cff57cd1e53d terminated, submit again
Traceback (most recent call last):
File "/public3/home/sc31148/.local/bin/dpgen", line 11, in
load_entry_point('dpgen==0.8.0', 'console_scripts', 'dpgen')()
File "/public3/home/sc31148/.local/lib/python3.7/site-packages/dpgen/main.py", line 182, in main
args.func(args)
File "/public3/home/sc31148/.local/lib/python3.7/site-packages/dpgen/generator/run.py", line 2309, in gen_run
run_iter (args.PARAM, args.MACHINE)
File "/public3/home/sc31148/.local/lib/python3.7/site-packages/dpgen/generator/run.py", line 2284, in run_iter
run_model_devi (ii, jdata, mdata)
File "/public3/home/sc31148/.local/lib/python3.7/site-packages/dpgen/generator/run.py", line 1010, in run_model_devi
errlog = 'model_devi.log')
File "/public3/home/sc31148/.local/lib/python3.7/site-packages/dpgen/dispatcher/Dispatcher.py", line 91, in run_jobs
while not self.all_finished(job_handler, mark_failure) :
File "/public3/home/sc31148/.local/lib/python3.7/site-packages/dpgen/dispatcher/Dispatcher.py", line 215, in all_finished
raise RuntimeError('Job %s failed for more than 3 times' % job_uuid)
RuntimeError: Job 6ec96f6e-e2f0-4832-8e5e-2d6db31ba8a0 failed for more than 3 times
So.Could you tell me what should I do next?
Beta Was this translation helpful? Give feedback.
All reactions