Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline gets stuck with weird nextflow.log #1299

Closed
me-orlov opened this issue May 13, 2024 · 5 comments
Closed

Pipeline gets stuck with weird nextflow.log #1299

me-orlov opened this issue May 13, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@me-orlov
Copy link

Description of the bug

I am running the pipeline on 27 samples. For a few hours in runs fine, and then gets stuck, and zero progress gets made after that (on some runs I've tried as long as two days of waiting). When I look in the log, after a while the log just repeats. Not sure if this is normal/how long to wait? Resuming does not improve the situation - it very quickly gets re-stuck. I'm including a log of a terminated run. I would be grateful if someone more experienced could shed light on the situation.

Command used and terminal output

nf-core/rnaseq -profile singularity -params-file nf-params.star.json

Last line right before the repeats:

May-12 18:00:49.026 [Task submitter] INFO  nextflow.Session - [e6/5d7588] Submitted process > NFCORE_RNASEQ:RNASEQ:BAM_MARKDUPLICATES_PICARD:SAMTOOLS_INDEX (MS7)

Here is the message that ends up repeating in the log:

May-13 15:52:30.869 [Task submitter] DEBUG n.processor.TaskPollingMonitor - %% executor local > tasks in the submission queue: 44 -- tasks to be submitted are shown below
~> TaskHandler[id: 161; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (MS11); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/f2/263dce08def8c0854859ae5235ed12]
~> TaskHandler[id: 162; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (MS3); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/5b/ae705da837e1a4e5e57287542eef9d]
~> TaskHandler[id: 163; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (MS20); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/a4/870d0b58d903387228380d8e4e610b]
~> TaskHandler[id: 164; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (MS13); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/be/9516f087bfa3075ecf0d5e6411a241]
~> TaskHandler[id: 165; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (Control1); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/39/0400f7b959b89af5ec3dcbc1f632a4]
~> TaskHandler[id: 166; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (MS2); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/a9/10acc3f3b6ee5bf4f26c2322801d0c]
~> TaskHandler[id: 167; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (Control8); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/7d/3a67fb632c34303fd0f1f163ee5dfe]
~> TaskHandler[id: 168; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (MS9); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/fd/ce7541457cdf2e1ab457d20fb2dbb0]
~> TaskHandler[id: 169; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (MS5); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/89/49dd038671bec7ffb627597350a299]
~> TaskHandler[id: 170; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (MS22); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/83/ffe4029c3f4d9926d6c6f4a83c1918]
.. remaining tasks omitted.
May-13 15:57:30.910 [Task submitter] DEBUG n.processor.TaskPollingMonitor - %% executor local > tasks in the submission queue: 44 -- tasks to be submitted are shown below
~> TaskHandler[id: 161; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (MS11); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/f2/263dce08def8c0854859ae5235ed12]
~> TaskHandler[id: 162; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (MS3); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/5b/ae705da837e1a4e5e57287542eef9d]
~> TaskHandler[id: 163; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (MS20); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/a4/870d0b58d903387228380d8e4e610b]
~> TaskHandler[id: 164; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (MS13); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/be/9516f087bfa3075ecf0d5e6411a241]
~> TaskHandler[id: 165; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (Control1); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/39/0400f7b959b89af5ec3dcbc1f632a4]
~> TaskHandler[id: 166; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (MS2); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/a9/10acc3f3b6ee5bf4f26c2322801d0c]
~> TaskHandler[id: 167; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (Control8); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/7d/3a67fb632c34303fd0f1f163ee5dfe]
~> TaskHandler[id: 168; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (MS9); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/fd/ce7541457cdf2e1ab457d20fb2dbb0]
~> TaskHandler[id: 169; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (MS5); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/89/49dd038671bec7ffb627597350a299]
~> TaskHandler[id: 170; name: NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (MS22); status: NEW; exit: -; error: -; workDir: /media/user/B0028A5B028A2706/Users/m_user/Documents/SRA/sra/PRJNA727413/work/83/ffe4029c3f4d9926d6c6f4a83c1918]
.. remaining tasks omitted.

Relevant files

nextflow.log

System information

Latest version of nextflow and the pipeline. Ubuntu. Singularity. Running on a standalone desktop.

@me-orlov me-orlov added the bug Something isn't working label May 13, 2024
@pinin4fjords
Copy link
Member

Just to check, does your local machine have sufficient resource? STAR_ALIGN is labelled process_high, which means it needs 72GB RAM.

@me-orlov
Copy link
Author

me-orlov commented May 14, 2024

Yes, I have 128 GB RAM! STAR_ALIGN runs fine

I have also narrowed down the problem/can provide more context. It seems like: on lines 2340 and 2344 of the log two processes are listed as submitted, but they never show up as RUNNING. After that the second submitted process, the log just contains the same messages on a loop (as posted above). Forgive my ignorance on the subject, but why do not those processes ever show up as running?

Furthermore, on line 2316, I have a STAR_ALIGN(Control4) process reported as running. But the pipeline stops reporting it, or anything else, as running. Nor does it indicate it as complete. When I look in the relevant work directory, the log file states for it says that it ran successfully. Is is possible that it's not registering as complete?

If check with -top, I do not see active pipeline-related processes when the pipeline gets stuck, only a sleeping java process that seems relevant. So it looks like, to me, that it stops and gets stuck?

@pinin4fjords
Copy link
Member

People tend not to run these workflows on one big local machine, so I don't have much experience to draw on. But I occasionally saw this sort of thing happen on institutional clusters, where bad nodes caused jobs to fail during scheduling, causing nextflow to lose track of them. I might suspect something analagous here.

Is there anything in the log files for those orphan processes that might tell you what went wrong (e.g. workdir/7f/907e33...)?

Failing that, I would direct you to the more general Nextflow help channels on Slack, since I don't think this is something specific to this workflow.

@pinin4fjords
Copy link
Member

Closing ticket since this has been quiet for a couple of weeks. Please reopen if you feel there is an issue specific to this pipeline, or ask in #rnaseq on the nf-core Slack.

@Mateopazcabezas
Copy link

I have the same issue ( also running on a single ubuntu computer), after 2 days the pipeline get stuck without propmpting an error but not performing any additional steps. (Im trying with HISET now in case a lack of memory (256gb) was the error source but it doesnt seem the case.

Hoping the author of the issue may reach and give further clarification on how did he/she fixed the issue.

In my case this would be the repeated log error

Aug-25 09:51:49.876 [Task submitter] DEBUG n.processor.TaskPollingMonitor - %% executor local > tasks in the submission queue: 213 -- tasks to be submitted are shown below
~> TaskHandler[id: 817; name: NFCORE_RNASEQ:RNASEQ:FASTQ_ALIGN_HISAT2:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_SORT (HUMT8_POSTm); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/cf/c9c26f02e8fe83465323811d66f60e]
~> TaskHandler[id: 818; name: NFCORE_RNASEQ:RNASEQ:BEDGRAPH_BEDCLIP_BEDGRAPHTOBIGWIG_REVERSE:UCSC_BEDCLIP (A193_POSTm); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/11/321462635ee7caae90a20e02fde6ee]
~> TaskHandler[id: 819; name: NFCORE_RNASEQ:RNASEQ:BEDGRAPH_BEDCLIP_BEDGRAPHTOBIGWIG_FORWARD:UCSC_BEDCLIP (A193_POSTm); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/f3/364c6956791cd99e923265d18c3383]
~> TaskHandler[id: 820; name: NFCORE_RNASEQ:RNASEQ:FASTQ_ALIGN_HISAT2:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_SORT (HUMT16_POSTm); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/f8/af91c6ba37038a25804b1fafd65922]
~> TaskHandler[id: 821; name: NFCORE_RNASEQ:RNASEQ:FASTQ_ALIGN_HISAT2:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_SORT (HUMTm_POSTm); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/23/1be151f93486336368f3c50b80d3a6]
~> TaskHandler[id: 822; name: NFCORE_RNASEQ:RNASEQ:FASTQ_ALIGN_HISAT2:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_SORT (HUMT2_PRE); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/0c/fbe5c5197067183fa09e50c5abcdfd]
~> TaskHandler[id: 823; name: NFCORE_RNASEQ:RNASEQ:FASTQ_ALIGN_HISAT2:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_SORT (HFJD1_POSTm); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/f7/4aa2c8939c3c5e97ec311fef57b26e]
~> TaskHandler[id: 824; name: NFCORE_RNASEQ:RNASEQ:FASTQ_ALIGN_HISAT2:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_SORT (HURC2_POST); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/66/9b039c3c6bf7d5d760fd3684dbd4c0]
~> TaskHandler[id: 825; name: NFCORE_RNASEQ:RNASEQ:FASTQ_ALIGN_HISAT2:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_SORT (HFJD6_PREm); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/91/912af20e0cdf48bc9b05b5835524f6]
~> TaskHandler[id: 826; name: NFCORE_RNASEQ:RNASEQ:FASTQ_ALIGN_HISAT2:BAM_SORT_STATS_SAMTOOLS:SAMTOOLS_SORT (HUMT9_PREm); status: NEW; exit: -; error: -; workDir: /media/bioinfo/DATAII/Conchi_Nunez/X204SC23063344-Z01-F001/work/97/cf108fb6cd65318a1cd0bff5b7d3c7]
.. remaining tasks omitted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants