Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment performance decreases with higher -p option value #478

Open
LawrenceLiu023 opened this issue May 23, 2024 · 6 comments
Open

Alignment performance decreases with higher -p option value #478

LawrenceLiu023 opened this issue May 23, 2024 · 6 comments

Comments

@LawrenceLiu023
Copy link

I am using Bismark to conduct alignment of methylation NGS data. The alignment engine used is bowtie2. There are 2 options in Bismark related to multi-core processing: -parallel determines how many bowtie2 instances is launched simultaneously and -p is the same as the -p option of Bowtie2.

I randomly sampled 100,000 and 1,000,000 pairs of reads from my fastq.gz file, and tried different settings of -parallel and -p options. The results seem to suggest that when -p is higher than 3, the higher -p is, the more time will be spent on alignment. I wonder if there is an optimal -p option setting, and whether this is a normal phenomenon. The test results are as follows:

image
image

CPU: Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz 24cores
Memory: 32GB
System: Linux 91d674d7009b 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 GNU/Linux
Bowtie2 version: bowtie2-align-s version 2.4.4 64-bit
Bismark version: v0.24.2

@sfiligoi
Copy link
Contributor

@LawrenceLiu023 That's not my experience.
Can you check that your process indeed has access to all the cores?
E.g. by using

taskset -pc <PID>

@LawrenceLiu023
Copy link
Author

Thank you for your response. I have carefully observed the CPU usage during my testing and found that when Bismark calls the bowtie2 genome alignment engine for methylation sequencing data alignment, it uses 2 bowtie2 instances simultaneously. When I set the -p parameter of bowtie2 to a number n, I can observe that the number of bowtie2 processes is indeed 2n, and the CPU utilization is also 2n*100%.

However, in my tests on both a 24-core server and a 32-core server, when the -p parameter is set to n>4, the speed does not increase significantly, and even starts to decrease.

I would like to ask, what -p parameter do you typically use for bowtie2? Have you also observed a clear performance improvement as you increase the -p parameter?

I also found a post that mentioned a similar performance issue, where the -p parameter seems to not bring much performance improvement after a certain value. The link is: https://www.biostars.org/p/92366/.

@sfiligoi
Copy link
Contributor

sfiligoi commented Jun 25, 2024

I saw good scalability to -p 16.
Note your CPU has only 12 CPU cores (x2 HT), so that could explain why the time grows from that point on.
Bowtie2 is also (mostly) memory bound, so scalability is know to be limited more by memory bandwidth than compute core TOPS.

That said, I see you are using an ancient version of bowtie2 (2.4.4).
There have been significant memory access improvements in 2.5.0, which should help in your case.
I would recommend you try the latest version.

PS: I will try to run a few benchmarks on my system and post the detailed results.

@LawrenceLiu023
Copy link
Author

Thank you for the additional information. I suspect that hyper-threading could indeed be a factor contributing to the performance plateau.

I installed bowtie2 using apt-get install bowtie2, so the version I have installed is the outdated 2.4.4 release. I will install the latest version and run some more benchmarks.

@sfiligoi
Copy link
Contributor

Here are a few data points for my 5M reads run using WoLr1 as the reference database:

NTHREADS Runtime
 2      13:25 mins
 4       6:47 mins
 8       3:29 mins
12       2:23 mins
16       2:08 mins

My CPU is
AMD EPYC 7302 16-Core Processor

The scaling does slow down close to the max, but it is almost linear up to -p 12 .

For completeness, bowtie2 v 2.5 and the command used is

$ /bin/time taskset -c 0-15 ./bowtie2 --no-exact-upfront --no-1mm-upfront -p${NTHREADS} -x /scratch/qp-woltka/WoLr1/WoLr1 -q ${INFILE} -S ${OUTFILE} --seed 42 --very-sensitive -k 16 --np 1 --mp "1,1" --rdg "0,1" --rfg "0,1" --score-min  "L,0,-0.05" --no-head --no-unal

4949790 reads; of these:
  4949790 (100.00%) were unpaired; of these:
    1493707 (30.18%) aligned 0 times
    1852338 (37.42%) aligned exactly 1 time
    1603745 (32.40%) aligned >1 times
69.82% overall alignment rate

@LawrenceLiu023
Copy link
Author

I ran the same test with v2.5.5 version of Bowtie2. The results show a significant increase in speed and a decrease in memory usage. The results are shown in the following plot. Considering that alignment of methylation sequencing data with Bismark starts two Bowtie2 alignment instances simultaneously, it is understandable that the speed stops increasing when p is over 6. I believe the best solution is to conduct more test on normal sequencing data with only Bowtie2 alone rather than methylation sequencing data. After completing more tests, I will provide an update here.
newplot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants