Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set BWA and bwamem2 index memory dynamically #6628

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

edmundmiller
Copy link
Contributor

@edmundmiller edmundmiller commented Sep 11, 2024

Kept having bwamem2 index tasks that ran forever and failed.
Updated bwamem2 to use 28.B of memory per byte of fasta. Issue for reference: bwa-mem2/bwa-mem2#9

Also tracked down the required memory for bwa index while I was at it. Doesn't seem to fail because most of the genome requirements are under the necessary memory.

Not the first place where people have run into this #6628

@matthdsm
Copy link
Contributor

I like it 👍 it’ll also play nice with the new resourceLimits directive

@ewels
Copy link
Member

ewels commented Sep 11, 2024

I was talking to @drpatelh about this earlier this week. Sounds good. Very neat if it scales in such a linear way.

Should we add a baseline of additional memory?

@edmundmiller
Copy link
Contributor Author

Closes nf-core/sarek#1377

@maxulysse
Copy link
Member

What can we do for bwa mem and bwamem2 mem?

@edmundmiller
Copy link
Contributor Author

What can we do for bwa mem and bwamem2 mem?

What do you mean?

@muffato
Copy link
Member

muffato commented Oct 16, 2024

Is it right to have these settings hardcoded in the module ? How does it interact with pipeline-level config file doing

withName BWAMEM2_INDEX {
    memory { ... }
}

which one takes precedence ?

@matthdsm
Copy link
Contributor

AFAIK the pipeline config takes precedence over the config hardcoded in the module.
If you're worried about requesting too much resources, the resourceLimit directive should take care of that nicely

@muffato
Copy link
Member

muffato commented Oct 16, 2024

I was more worried that 28 GB / Gbp is still too high in my view. I use 24 GB / Gbp in my pipelines and wouldn't want nf-core to force me to waste RAM ;)
Also, your memory definition doesn't consider task.attempt. Are you absolutely certain that 28 GB / Gbp will work for every genome ? Usually, nf-core resource definitions always factor task.attempt.

I wasn't worried of check_max missing since nf-core is about to mandate a recent Nextflow that supports resourceLimit.

@muffato
Copy link
Member

muffato commented Oct 16, 2024

FYI, I've just checked our LSF logs and there's been zero memory failures over the 1,698 BWAMEM2_INDEX processes that we ran in 2024 with 24 GB/Gbp.
The memory efficiency is ~76% (median), and goes up to 95%, meaning that 23 GB/Gbp might still work for all genomes (it's just at the limit), but 22 GB/Gbp for sure would yield some memory errors.

Regardless of the scaling factor you use, I'd still keep task.attempt just in case (I'm overcautious !).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants