Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slurm erro #1504

Open
jwli-code opened this issue Oct 16, 2024 · 2 comments
Open

slurm erro #1504

jwli-code opened this issue Oct 16, 2024 · 2 comments

Comments

@jwli-code
Copy link

cat Bnanew_cactusv2.9.0_69.err
[2024-10-16T21:35:18+0800] [MainThread] [I] [toil.statsAndLogging] Enabling realtime logging in Toil
[2024-10-16T21:35:18+0800] [MainThread] [W] [toil.lib.humanize] Deprecated toil method. Please use "toil.lib.conversions.human2bytes()" instead."
[2024-10-16T21:35:18+0800] [MainThread] [W] [toil.lib.humanize] Deprecated toil method. Please use "toil.lib.conversions.human2bytes()" instead."
[2024-10-16T21:35:18+0800] [MainThread] [I] [toil.statsAndLogging] Cactus Command: /public/home/jwli1/software/cactus-bin-v2.9.2/cactus_env/bin/cactus-pangenome /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/Bna16new_js /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/Bna_16.seq --outDir /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/Bnanew_16.pg --outName Bna_16-pg --reference BnaZS11 --vcf --giraffe --gfa --gbz --maxCores 30 --mapCores 30 --consCores 30 --indexCores 30 --mgCores 30 --permissiveContigFilter --batchSystem slurm --doubleMem true --workDir /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/wk --coordinationDir /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/tmp --batchLogsDir /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/batch-logs
[2024-10-16T21:35:18+0800] [MainThread] [I] [toil.statsAndLogging] Cactus Commit: 8b4e8a9
[2024-10-16T21:35:18+0800] [MainThread] [I] [toil.statsAndLogging] Importing file:///public/home/jwli1/data1/pangene_2024/01.cactus/00.data/Bna_Darmor_v10.0.fasta
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.statsAndLogging] Importing file:///public/home/jwli1/data1/pangene_2024/01.cactus/00.data/Bna_Express617_v1.0.fasta
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.statsAndLogging] Importing file:///public/home/jwli1/data1/pangene_2024/01.cactus/00.data/Bna_GanganF73_v0.0.fasta
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.statsAndLogging] Importing file:///public/home/jwli1/data1/pangene_2024/01.cactus/00.data/Bna_GH06_v1.0.fasta
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.statsAndLogging] Importing file:///public/home/jwli1/data1/pangene_2024/01.cactus/00.data/Bna_Ningyou7_v2.0.fasta
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.statsAndLogging] Importing file:///public/home/jwli1/data1/pangene_2024/01.cactus/00.data/Bna_No2127_v0.0.fasta
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.statsAndLogging] Importing file:///public/home/jwli1/data1/pangene_2024/01.cactus/00.data/Bna_QuintaA_v0.0.fasta
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.statsAndLogging] Importing file:///public/home/jwli1/data1/pangene_2024/01.cactus/00.data/Bna_Shengli3_v0.0.fasta
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.statsAndLogging] Importing file:///public/home/jwli1/data1/pangene_2024/01.cactus/00.data/Bna_Tapidor3_v0.0.fasta
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.statsAndLogging] Importing file:///public/home/jwli1/data1/pangene_2024/01.cactus/00.data/Bna_Westar_v0.0.fasta
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.statsAndLogging] Importing file:///public/home/jwli1/data1/pangene_2024/01.cactus/00.data/Bna_Xiaoyun.fasta
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.statsAndLogging] Importing file:///public/home/jwli1/data1/pangene_2024/01.cactus/00.data/Bna_Zheyou73_v0.0.fasta
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.statsAndLogging] Importing file:///public/home/jwli1/data1/pangene_2024/01.cactus/00.data/Bna_ZS11_v0.0.fasta
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.statsAndLogging] Importing file:///public/home/jwli1/data1/pangene_2024/01.cactus/00.data/Bna_ZY821_v1.0.fasta
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.statsAndLogging] Importing file:///public/home/jwli1/data1/pangene_2024/01.cactus/00.data/Bna_DC1.fasta
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.statsAndLogging] Importing file:///public/home/jwli1/data1/pangene_2024/01.cactus/00.data/Bna_Da-Ae.fasta
[2024-10-16T21:35:23+0800] [MainThread] [W] [toil.common] Batch system does not support auto-deployment. The user script ModuleDescriptor(dirPath='/public/home/jwli1/software/cactus-bin-v2.9.2/cactus_env/lib/python3.8/site-packages', name='cactus.refmap.cactus_pangenome', fromVirtualEnv=True) will have to be present at the same location on every worker.
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil] Running Toil version 7.0.0-d569ea5711eb310ffd5703803f7250ebf7c19576 on host b202r4n2.
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.realtimeLogger] Starting real-time logging.
[2024-10-16T21:35:23+0800] [MainThread] [I] [toil.leader] Issued job 'pangenome_end_to_end_workflow' kind-pangenome_end_to_end_workflow/instance-ir8bs_5r v1 with job batch system ID: 1 and disk: 2.0 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:35:25+0800] [MainThread] [I] [toil.leader] 0 jobs are running, 1 jobs are issued and waiting to run
[2024-10-16T21:36:28+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-m8uamhba v1 with job batch system ID: 2 and disk: 6.6 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:36:28+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-8eshjz2f v1 with job batch system ID: 3 and disk: 6.6 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:36:28+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-uoft6ap9 v1 with job batch system ID: 4 and disk: 5.6 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:36:28+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-g85qc1_1 v1 with job batch system ID: 5 and disk: 5.7 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:36:28+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-gh9lgts7 v1 with job batch system ID: 6 and disk: 6.6 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:36:28+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-7yaok7_m v1 with job batch system ID: 7 and disk: 6.6 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:36:28+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-lx4qdhti v1 with job batch system ID: 8 and disk: 6.6 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:36:28+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-mz12jjfw v1 with job batch system ID: 9 and disk: 6.5 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:36:28+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-ofn6sbaw v1 with job batch system ID: 10 and disk: 6.0 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:36:28+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-bzzm4676 v1 with job batch system ID: 11 and disk: 6.1 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:36:28+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-_wzcg8vp v1 with job batch system ID: 12 and disk: 6.3 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:36:28+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-cs9vkyzt v1 with job batch system ID: 13 and disk: 6.5 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:36:28+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-juckbuzr v1 with job batch system ID: 14 and disk: 6.5 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:36:28+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-nhd77u54 v1 with job batch system ID: 15 and disk: 6.7 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:36:28+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-v_4wchhl v1 with job batch system ID: 16 and disk: 6.8 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:36:28+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-c02qgk65 v1 with job batch system ID: 17 and disk: 6.6 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:37:20+0800] [MainThread] [I] [toil-rt] 2024-10-16 21:37:20.288784: Running the command: "cactus_sanitizeFastaHeaders /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/wk/toilwf-79343289208b59e1b3fdb85f2bd7e7e0/c487/job/tmpa7ai2jzy/BnaZS11.fa BnaZS11 -p"
[2024-10-16T21:37:22+0800] [MainThread] [I] [toil-rt] 2024-10-16 21:37:22.195529: Running the command: "cactus_sanitizeFastaHeaders /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/wk/toilwf-79343289208b59e1b3fdb85f2bd7e7e0/5df7/job/tmp51t5_umm/BnaWestar.fa BnaWestar -p"
[2024-10-16T21:37:22+0800] [MainThread] [I] [toil-rt] 2024-10-16 21:37:22.314356: Running the command: "cactus_sanitizeFastaHeaders /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/wk/toilwf-79343289208b59e1b3fdb85f2bd7e7e0/031a/job/tmpul07irlq/BnaDC1.fa BnaDC1 -p"
[2024-10-16T21:37:22+0800] [MainThread] [I] [toil-rt] 2024-10-16 21:37:22.315652: Running the command: "cactus_sanitizeFastaHeaders /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/wk/toilwf-79343289208b59e1b3fdb85f2bd7e7e0/fec9/job/tmphq1x_8ly/BnaExpress617.fa BnaExpress617 -p"
[2024-10-16T21:37:22+0800] [MainThread] [I] [toil-rt] 2024-10-16 21:37:22.384000: Running the command: "cactus_sanitizeFastaHeaders /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/wk/toilwf-79343289208b59e1b3fdb85f2bd7e7e0/a08b/job/tmpfba1v8y0/BnaQuintaA.fa BnaQuintaA -p"
[2024-10-16T21:37:22+0800] [MainThread] [I] [toil-rt] 2024-10-16 21:37:22.460364: Running the command: "cactus_sanitizeFastaHeaders /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/wk/toilwf-79343289208b59e1b3fdb85f2bd7e7e0/69eb/job/tmpnbt6reu9/BnaNingyou7.fa BnaNingyou7 -p"
[2024-10-16T21:37:22+0800] [MainThread] [I] [toil-rt] 2024-10-16 21:37:22.483570: Running the command: "cactus_sanitizeFastaHeaders /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/wk/toilwf-79343289208b59e1b3fdb85f2bd7e7e0/3a3e/job/tmplotlce6e/BnaShengli3.fa BnaShengli3 -p"
[2024-10-16T21:37:23+0800] [MainThread] [W] [toil.leader] Job failed with exit value 1: 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-uoft6ap9 v1
Exit reason: FAILED
[2024-10-16T21:37:23+0800] [MainThread] [W] [toil.leader] Job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-uoft6ap9 v1 has no new version available immediately. The batch system may have killed (or never started) the Toil worker.
[2024-10-16T21:37:23+0800] [MainThread] [W] [toil.leader] No log file is present, despite job failing: 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-uoft6ap9 v1
[2024-10-16T21:37:23+0800] [MainThread] [W] [toil.leader] The batch system left an empty file /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/batch-logs/toil_cbc5caf1-add6-4995-934f-c6deb16c5ad6.4.17321287.out.log
[2024-10-16T21:37:23+0800] [MainThread] [W] [toil.leader] The batch system left a non-empty file /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/batch-logs/toil_cbc5caf1-add6-4995-934f-c6deb16c5ad6.4.17321287.err.log:
[2024-10-16T21:37:23+0800] [MainThread] [W] [toil.leader] Log from job "kind-sanitize_fasta_header/instance-uoft6ap9" follows:
=========>
Traceback (most recent call last):
File "/public/home/jwli1/software/cactus-bin-v2.9.2/cactus_env/bin/_toil_worker", line 8, in
sys.exit(main())
File "/public/home/jwli1/software/cactus-bin-v2.9.2/cactus_env/lib/python3.8/site-packages/toil/worker.py", line 765, in main
with in_contexts(options.context):
File "/public/home/jwli1/micromamba/envs/cactus_new/lib/python3.8/contextlib.py", line 113, in enter
return next(self.gen)
File "/public/home/jwli1/software/cactus-bin-v2.9.2/cactus_env/lib/python3.8/site-packages/toil/worker.py", line 745, in in_contexts
with manager:
File "/public/home/jwli1/software/cactus-bin-v2.9.2/cactus_env/lib/python3.8/site-packages/toil/batchSystems/cleanup_support.py", line 80, in enter
self.arena.enter()
File "/public/home/jwli1/software/cactus-bin-v2.9.2/cactus_env/lib/python3.8/site-packages/toil/lib/threading.py", line 534, in enter
with global_mutex(self.base_dir, self.mutex):
File "/public/home/jwli1/micromamba/envs/cactus_new/lib/python3.8/contextlib.py", line 113, in enter
return next(self.gen)
File "/public/home/jwli1/software/cactus-bin-v2.9.2/cactus_env/lib/python3.8/site-packages/toil/lib/threading.py", line 394, in global_mutex
fd_stats = os.fstat(fd)
FileNotFoundError: [Errno 2] No such file or directory
<=========
[2024-10-16T21:37:23+0800] [MainThread] [W] [toil.job] Due to failure we are reducing the remaining try count of job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-uoft6ap9 v1 with ID kind-sanitize_fasta_header/instance-uoft6ap9 to 1
[2024-10-16T21:37:23+0800] [MainThread] [I] [toil.leader] Issued job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-uoft6ap9 v2 with job batch system ID: 18 and disk: 5.6 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-10-16T21:37:23+0800] [MainThread] [W] [toil.leader] Job failed with exit value 1: 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-g85qc1_1 v1
Exit reason: FAILED
[2024-10-16T21:37:23+0800] [MainThread] [W] [toil.leader] Job 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-g85qc1_1 v1 has no new version available immediately. The batch system may have killed (or never started) the Toil worker.
[2024-10-16T21:37:23+0800] [MainThread] [W] [toil.leader] No log file is present, despite job failing: 'sanitize_fasta_header' kind-sanitize_fasta_header/instance-g85qc1_1 v1
[2024-10-16T21:37:23+0800] [MainThread] [W] [toil.leader] The batch system left an empty file /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/batch-logs/toil_cbc5caf1-add6-4995-934f-c6deb16c5ad6.5.17321288.out.log
[2024-10-16T21:37:23+0800] [MainThread] [W] [toil.leader] The batch system left a non-empty file /public/home/jwli1/data1/pangene_2024/01.cactus/slurm/batch-logs/toil_cbc5caf1-add6-4995-934f-c6deb16c5ad6.5.17321288.err.log:
[2024-10-16T21:37:23+0800] [MainThread] [W] [toil.leader] Log from job "kind-sanitize_fasta_header/instance-g85qc1_1" follows:
=========>
Traceback (most recent call last):
File "/public/home/jwli1/software/cactus-bin-v2.9.2/cactus_env/bin/_toil_worker", line 8, in
sys.exit(main())
File "/public/home/jwli1/software/cactus-bin-v2.9.2/cactus_env/lib/python3.8/site-packages/toil/worker.py", line 765, in main
with in_contexts(options.context):
File "/public/home/jwli1/micromamba/envs/cactus_new/lib/python3.8/contextlib.py", line 113, in enter
return next(self.gen)
File "/public/home/jwli1/software/cactus-bin-v2.9.2/cactus_env/lib/python3.8/site-packages/toil/worker.py", line 745, in in_contexts
with manager:
File "/public/home/jwli1/software/cactus-bin-v2.9.2/cactus_env/lib/python3.8/site-packages/toil/batchSystems/cleanup_support.py", line 80, in enter
self.arena.enter()
File "/public/home/jwli1/software/cactus-bin-v2.9.2/cactus_env/lib/python3.8/site-packages/toil/lib/threading.py", line 534, in enter
with global_mutex(self.base_dir, self.mutex):
File "/public/home/jwli1/micromamba/envs/cactus_new/lib/python3.8/contextlib.py", line 113, in enter
return next(self.gen)
File "/public/home/jwli1/software/cactus-bin-v2.9.2/cactus_env/lib/python3.8/site-packages/toil/lib/threading.py", line 394, in global_mutex
fd_stats = os.fstat(fd)
FileNotFoundError: [Errno 2] No such file or directory
<=========

@glennhickey
Copy link
Collaborator

This seems like Toil not working with your cluster. Perhaps @adamnovak has a more specific idea about what's going on (thanks again, Adam!)

@jwli-code
Copy link
Author

Here is my command, I'm not sure if there is any issue with it

seq=Bna_16.seq
mkdir wk tmp batch-logs
#module purge
module load compiler/intel/2017.5.239 mpi/hpcx/2.4.1/intel-2017.5.239
module load apps/abinit/8.10.3/hpcx-2.4.1-intel2017
source /public/home/jwli1/software/cactus-bin-v2.9.2/cactus_env/bin/activate
export TOIL_SLURM_ARGS="-p hebhcnormal01 -n 30 -N 1"
sbatch --job-name="Bnaslurm" -o Bnanew_cactusv2.9.0_69.log -e Bnanew_cactusv2.9.0_69.err -N 1 -n 30 -p hebhcnormal01 --wrap="\
source ~/micromamba/etc/profile.d/micromamba.sh && micromamba activate cactus_new && source /public/home/jwli1/software/cactus-bin-v2.9.2/cactus_env/bin/activate && which python && cactus-pangenome ${PWD}/Bna16new_js ${PWD}/$seq --outDir ${PWD}/Bnanew_16.pg --outName Bna_16-pg --reference BnaZS11 --vcf --giraffe --gfa --gbz --maxCores 30 --mapCores 30 --consCores 30 --indexCores 30 --mgCores 30 --permissiveContigFilter --batchSystem slurm --doubleMem true --workDir ${PWD}/wk --coordinationDir ${PWD}/tmp --batchLogsDir ${PWD}/batch-logs
"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants