Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bonito duplex runs out of memory #390

Closed
sathish-t opened this issue May 29, 2024 · 2 comments
Closed

bonito duplex runs out of memory #390

sathish-t opened this issue May 29, 2024 · 2 comments

Comments

@sathish-t
Copy link

Hello,

I use bonito 0.7.2 and the following bonito duplex command runs out of memory sometimes:
bonito duplex --threads 30 $bam_file $pair_file > $duplex_basecalls_fq

While processing about 2.4 million reads on a slurm job, I got this error

> outputting unaligned fastq                                                                                         
> calling:   7%|##4                                   | 162658/2475888 [02:54<41:25, 930.64 pairs/s]slurmstepd: error
: Job 60123323 exceeded memory limit (144467984 > 125829120), being killed                    
slurmstepd: error: Exceeded job memory limit                                                                         
slurmstepd: error: *** JOB 60123323 ON t1024n1 CANCELLED AT 2024-05-22T20:27:36 ***    

But bonito ran just fine with a similar command processing about 2.9 million reads in a different job

Performing Bonito base-space duplex basecalling                                                                     
> outputting unaligned fastq                                                                                         
> completed reads: 2905869                                                                                           
> duration: 0:06:46                                                                                                  
> bases per second 8.7E+05                                                                                          
> done   

Any help is appreciated!

Regards
Sathish

@sathish-t
Copy link
Author

Hi,

I was able to solve the problem by using fewer threads (--threads 3). We are fine with this performance hit as there are other much slower steps in our pipeline. Do you have an inkling of why more threads are problematic?

Regards
Sathish

@davidnewman02
Copy link
Collaborator

Duplex basecalling is a memory intensive process as it requires alignment of two signals against one another and processing these as a pair. In Bonito this is done in a multithreaded manner to improve performance and so each thread can consume a large amount of memory. Unfortunately the only solution to this is either to drop the --threads as you suggest or to run on a machine with more available RAM.

If you're interested in duplex basecalling I would encourage you to try with dorado (https://github.com/nanoporetech/dorado), this is higher performance and uses improved algorithms which should give better results. It is a production tool and as such has better documentation and official support: https://dorado-docs.readthedocs.io/en/latest/basecaller/duplex/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants