Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Everything looks ok, but spades stuck #1444

Open
1 task done
xzhbio opened this issue Feb 7, 2025 · 10 comments
Open
1 task done

Everything looks ok, but spades stuck #1444

xzhbio opened this issue Feb 7, 2025 · 10 comments

Comments

@xzhbio
Copy link

xzhbio commented Feb 7, 2025

Description of bug

Everything appears to be running normally—logs show no errors or warnings, and system outputs look as expected. However, the program has been stuck for over 30 hours without making any progress.

I allocated 16 threads, but at the point where the process stalled, the CPU usage was only about 30% on a single thread, and it remained stuck indefinitely. Additionally, when I allocate too many threads, the program terminates due to insufficient memory allocation (OS return-value: 12). I have previously run tests on SPAdes, and it executed normally.

I would appreciate any insights or suggestions on what might be causing the stall. Thank you!

spades.log

spades.log

params.txt

params.txt

SPAdes version

SPAdes version: 4.0.0

Operating System

OS: Linux-3.10.0-1127.el7.x86_64-x86_64-with-glibc2.17

Python Version

Python version: 3.13.1

Method of SPAdes installation

conda

No errors reported in spades.log

  • Yes
@Jose-LSP
Copy link

Jose-LSP commented Feb 7, 2025

I have more or less the same problem, but it gets stuck in the read error correction up to one week although log only shows about 1 hour run.

@asl
Copy link
Member

asl commented Feb 7, 2025

Likely the problem is around I/O on your server if it stuck at this moment. Try moving temporary directory location from some network shared storage to local / scratch.

@xzhbio
Copy link
Author

xzhbio commented Feb 8, 2025

Likely the problem is around I/O on your server if it stuck at this moment. Try moving temporary directory location from some network shared storage to local / scratch.

Thanks for the suggestion. My temporary directory is already located on local storage. Additionally, I noticed that when I downsample the data to 1GB (from the original 40GB), the assembly completes without any issues. However, when I downsample to 4GB, the k-mer counting step gets stuck and the process halts.

@xzhbio
Copy link
Author

xzhbio commented Feb 8, 2025

I have more or less the same problem, but it gets stuck in the read error correction up to one week although log only shows about 1 hour run.

Hi, I noticed that even though the process has been stuck for a week, you haven't terminated it. Does this indicate that it might still be running, albeit very slowly? I'm curious if you've managed to resolve this issue or if you have any further insights to share.

@asl
Copy link
Member

asl commented Feb 8, 2025

Thanks for the suggestion. My temporary directory is already located on local storage.

It doesn't seem so. You spades.log reads:

Other parameters:
  Dir for temp files: /public/home/xzh/south/11-JRT-2/assembly_0.1/tmp

And indeed, there is no --tmp-dir option used.

@xzhbio
Copy link
Author

xzhbio commented Feb 14, 2025

And indeed, there is no --tmp-dir option used.

The default tmp-dir is already set to the output directory. I have attempted to use the --tmp-dir option to specify a different temporary directory, but it still doesn't work as expected.

Additionally, the process stopped with the message "finished abnormally, OS return value: 12," despite having at least 1600GB of free memory available.

spades.log

spades.log

params.txt

params.txt

@asl
Copy link
Member

asl commented Feb 14, 2025

The default tmp-dir is already set to the output directory.

Right. And if it on some kind of NFS shared storage, it could easily cause problems as these systems were not designed to handle big I/O

Additionally, the process stopped with the message "finished abnormally, OS return value: 12," despite having at least 1600GB of free memory available.

It doesn't seem so:

  Memory limit (in Gb): 250

So, the hard memory limit was set to 250 Gb (default) and you have not overrode it. As a result, when more RAM was required you received out of memory error per log:

  3:20:16.175    82G / 96G   ERROR   General                 (mmapped_reader.hpp        :  52)   mmap(2) failed. Reason: Cannot allocate memory. Error code: 12

@xzhbio
Copy link
Author

xzhbio commented Feb 14, 2025

By the way, I was wondering if downsampling the data would improve the assembly results or make them worse.

@xzhbio
Copy link
Author

xzhbio commented Feb 14, 2025

Right. And if it on some kind of NFS shared storage, it could easily cause problems as these systems were not designed to handle big I/O

The file system in use is ParaStor, a distributed file system. If it is ok?

So, the hard memory limit was set to 250 Gb (default) and you have not overrode it. As a result, when more RAM was required you received out of memory error per log:

I reset the memory limit but still got the same issue.

spades.log

spades.log

params.txt

params.txt

@asl
Copy link
Member

asl commented Feb 14, 2025

The file system in use is ParaStor, a distributed file system. If it is ok?

You'd better ask your system administrator. We cannot know the specifics of every NAS solutions and its issues.

I reset the memory limit but still got the same issue.

You didn't:

  Threads: 1600
  Memory limit (in Gb): 250

Please next time double check the options & log before submitting the issue. Please refer to SPAdes manual for the information about command line options: https://ablab.github.io/spades/running.html#advanced-options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants