Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about setting own tmp dir #23

Closed
xxYaaoo opened this issue Apr 20, 2024 · 13 comments
Closed

Question about setting own tmp dir #23

xxYaaoo opened this issue Apr 20, 2024 · 13 comments

Comments

@xxYaaoo
Copy link

xxYaaoo commented Apr 20, 2024

Hi~

Recently, I'm struggling with the problem that I have to set my own tmp directory while running the GraffiTE, because of the limited access authority in group server. I used the command line ‘export NXF_TEMP=’ in my slurm script to set the tmp dir. However, the squeue showed that my task job was in normally running state, but the output dir contained nothing. I also tried the way you mentioned in the ‘important note' to revise the nextflow.config, but the slurm task showed error as the moment I sbatched my work. Any idea could figure my problem out?

Thank you so much!

@xxYaaoo
Copy link
Author

xxYaaoo commented Apr 21, 2024

Dear Professor Cristian,

Update my error. The output dir finally contained two folders [SV_search and Repeat_Filting], but my task still failed.
image
image
Any suggestion to solve this problem?~

Very thankful~!

@cgroza
Copy link
Owner

cgroza commented Apr 25, 2024

Hi

May I see your nextflow.config?

@cgroza
Copy link
Owner

cgroza commented Apr 25, 2024

Also, I pushed a commit that may fix the error in the tsd_report step.
Can you please pull the latest version and try again?

@xxYaaoo
Copy link
Author

xxYaaoo commented Apr 25, 2024

Hi~

This is my nextflow.config file. My latest try did not change the content of config file and met the error I showed above.
image
Sure! I will pull the latest version and try again!

Thank you so much!

@cgroza
Copy link
Owner

cgroza commented Apr 25, 2024

I see you were on an older version of the config.

Also try this nextflow.config:

manifest.defaultBranch = 'main'
singularity.enabled = true
singularity.autoMounts = true
singularity.runOptions = '--contain --bind $(pwd):/tmp'

profiles {
    standard {
        process.executor = 'local'
        process.container = 'library://cgroza/collection/graffite:latest'
    }

    cluster {
        process.executor = 'slurm'
        process.container = 'library://cgroza/collection/graffite:latest'
        process.scratch = '$SLURM_TMPDIR'
    }

    cloud {
        process.executor = 'aws'
        process.container = 'library://cgroza/collection/graffite:latest'
    }

}

@xxYaaoo
Copy link
Author

xxYaaoo commented Apr 25, 2024

OK, thank you for your help! !

Do I need to add the 'export NXF_TEMP=’ in my slurm script while use this new nextflow.config?

@cgroza
Copy link
Owner

cgroza commented Apr 25, 2024

I don't touch that variable when I run nextflow on my cluster.
However, it may be different for you. Try without first.

@xxYaaoo
Copy link
Author

xxYaaoo commented Apr 25, 2024

Ok, really appreciate your help!

@clemgoub
Copy link
Collaborator

Hi @xxYaaoo, do you still have problem with this issue? Let us know if you need further assistance!

@amnghn
Copy link

amnghn commented Jul 11, 2024

Hi @cgroza and @clemgoub,
I've been struggling with the same tmp dir problem. I checked the issues #8, #12, #31, #24 and the "important-note" but couldn't figure out how to solve it. Here is the command I'm using to run GraffiTE on our slurm cluster.

nextflow run /lisc/scratch/botany/amin/te_detection/pME/GraffiTE/main.nf \
    --vcf /lisc/scratch/botany/amin/te_detection/pME/test_run/results/1_SV_search/svim-asm_variants.vcf \
    --reference input/vieillardii1167c.asm.bp.p_ctg.fa \
    --TE_library input/vieillardii.fasta.mod.EDTA.TElib.fa \
    --out results \
    --genotype false \
    -profile cluster \
    -with-report reports/report_${SLURM_JOB_ID}.html \
    -resume

I used --vcf instead of --assemblies as @clemgoub explained here.

And here is the nextflow.config file

manifest.defaultBranch = 'main'
singularity.enabled = true
singularity.autoMounts = true
singularity.runOptions = '--contain --bind /lisc/scratch/botany/amin/te_detection/pME/test_run/temp_dir:/tmp'

profiles {
    standard {
        process.executor = 'local'
        process.container = '/lisc/scratch/botany/amin/te_detection/pME/graffite_latest.sif'
    }

    cluster {
        process.executor = 'slurm'
        process.container = '/lisc/scratch/botany/amin/te_detection/pME/graffite_latest.sif'
        process.scratch = '$SLURM_TMPDIR'
    }

    cloud {
        process.executor = 'aws'
        process.container = '/lisc/scratch/botany/amin/te_detection/pME/graffite_latest.sif'
    }

}

temp_dir is writable and is used by the repeatmask_VCF process (I checked this while this job was running; there were a lot of tmp file in it). temp_dir has currently three empty subdirectories: nxf.j8zh7vIZHc, slurm-2228294 and slurm-2297514. The last one is the one that repeatmask_VCF process used.

I ran the pipeline with changing process.scratch = '$SLURM_TMPDIR' to process.scratch = '/lisc/scratch/botany/amin/te_detection/pME/test_run/temp_dir' in the nextflow.config file but I got the exact same error. Also singularity.runOptions = '--contain --bind $(pwd):/tmp' did not help, either!

The pipeline stops running about half an hour after submittingtsd_prep process and doesn't generate the 3_TSD_search directory.

These are the last lines in the .nextflow.log file

~> TaskHandler[jobId: 2297514; id: 1; name: repeatmask_VCF (1); status: RUNNING; exit: -; error: -; workDir: /lisc/scratch/botany/amin/te_detection/pME/test_run/work/50/98452e410be717b0d27a72b3705134 started: 1720609826603; exited: -; ]
Jul-10 21:17:29.919 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 2297514; id: 1; name: repeatmask_VCF (1); status: COMPLETED; exit: 0; error: -; workDir: /lisc/scratch/botany/amin/te_detection/pME/test_run/work/50/98452e410be717b0d27a72b3705134 started: 1720609826603; exited: 2024-07-10T19:17:28Z; ]
Jul-10 21:17:29.927 [Task monitor] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'TaskFinalizer' minSize=10; maxSize=10; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Jul-10 21:17:30.511 [TaskFinalizer-1] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'PublishDir' minSize=10; maxSize=10; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Jul-10 21:17:31.302 [Task submitter] DEBUG nextflow.executor.GridTaskHandler - [SLURM] submitted process tsd_prep (1) > jobId: 2299973; workDir: /lisc/scratch/botany/amin/te_detection/pME/test_run/work/3d/2b7f19fd8b29d774c934fbaa358251
Jul-10 21:17:31.304 [Task submitter] INFO  nextflow.Session - [3d/2b7f19] Submitted process > tsd_prep (1)
Jul-10 21:18:04.894 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 2299973; id: 2; name: tsd_prep (1); status: COMPLETED; exit: 0; error: -; workDir: /lisc/scratch/botany/amin/te_detection/pME/test_run/work/3d/2b7f19fd8b29d774c934fbaa358251 started: 1720639059895; exited: 2024-07-10T19:18:01Z; ]
Jul-10 21:18:05.002 [main] DEBUG nextflow.Session - Session await > all processes finished
Jul-10 21:18:09.889 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: slurm) - terminating tasks monitor poll loop
Jul-10 21:18:09.891 [main] DEBUG nextflow.Session - Session await > all barriers passed
Jul-10 21:18:09.908 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'TaskFinalizer' shutdown completed (hard=false)
Jul-10 21:18:09.925 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'PublishDir' shutdown completed (hard=false)
Jul-10 21:18:09.977 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=2; failedCount=0; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=5d 9h 51m 38s; failedDuration=0ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=1; peakCpus=16; peakMemory=20 GB; ]
Jul-10 21:18:09.979 [main] DEBUG nextflow.trace.ReportObserver - Workflow completed -- rendering execution report
Jul-10 21:18:19.223 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
Jul-10 21:18:19.489 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'FileTransfer' shutdown completed (hard=false)
Jul-10 21:18:19.510 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

Here is the .command.log in the /work/3d

##LiSC job info: the temporary directory of your job is also available read-only until 3 days after job end on the login nodes (login01/login02) under this path: /lisc/slurm/node-b07/tmp/slurm-2299973
##LiSC job info: Temporary folders of finished jobs are offline when their compute node went into power-saving sleep. For access to these folders, please contact the helpdesk.
INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
INFO:    gocryptfs not found, will not be able to use gocryptfs
extracting flanking...
sort: cannot create temporary file in '/tmp/slurm-2299973': No such file or directory
index file vieillardii1167c.asm.bp.p_ctg.fa.fai not found, generating...
extracting SVs' 5' and 3' ends...
sort: cannot create temporary file in '/tmp/slurm-2299973': No such file or directory

I would be grateful if you could help me fix this issue.

@clemgoub
Copy link
Collaborator

Hello @amnghn ! I'm really sorry you are stuck with this mktemp error.

I'm looking forward to hear about @cgroza opinion. Do you have an empty VCF after the RepeatMasker process? Often mktemp will fail at this stage but the pipeline keeps going until the TSD process, and then crashes.

Could you send us the complete .command.log and .command.err for the RepeatMasker and TSD processes?

Meanwhile, have you tried to run with the standard Nextflow profile? Since the main task of your job is RepeatMasker, this shouldn't affect the speed much.

Also, if you haven't, I'd try to see with your system admins if the process.scratch = variable carries over to the node the process is dispatched. Perhaps it is interpreted on the shell/node where you run the main command, but not on the shell/node that runs the process.

Thanks,

Clément

@amnghn
Copy link

amnghn commented Jul 12, 2024

Hi @clemgoub,
Thanks a lot for your reply. I finally managed to fix this issue by changing process.scratch = '$SLURM_TMPDIR' to process.scratch = '$TMPDIR'. In our cluster, SLURM_ should be omitted. I'm very glad that I got the final GraffiTE.merged.genotypes.vcf.gz and all the individual VCF files.

The VCF file generated by repeatmasker was not empty even when I had issues with the TSD processes.

Thanks a lot for developing this great pipeline. This was a test run (3 species, 48 samples), I'm planning to run it on 370 individuals of ca. 30 species

@clemgoub
Copy link
Collaborator

Amazing! Thanks a lot for your kind word and sharing your solution! I'm sure it'll help more users as well!

Cheers,

Clément

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants