Alignment performance decreases with higher `-p` option value #478

LawrenceLiu023 · 2024-05-23T01:52:23Z

I am using Bismark to conduct alignment of methylation NGS data. The alignment engine used is bowtie2. There are 2 options in Bismark related to multi-core processing: -parallel determines how many bowtie2 instances is launched simultaneously and -p is the same as the -p option of Bowtie2.

I randomly sampled 100,000 and 1,000,000 pairs of reads from my fastq.gz file, and tried different settings of -parallel and -p options. The results seem to suggest that when -p is higher than 3, the higher -p is, the more time will be spent on alignment. I wonder if there is an optimal -p option setting, and whether this is a normal phenomenon. The test results are as follows:

CPU: Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz 24cores
Memory: 32GB
System: Linux 91d674d7009b 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 GNU/Linux
Bowtie2 version: bowtie2-align-s version 2.4.4 64-bit
Bismark version: v0.24.2

The text was updated successfully, but these errors were encountered:

sfiligoi · 2024-06-24T17:20:19Z

@LawrenceLiu023 That's not my experience.
Can you check that your process indeed has access to all the cores?
E.g. by using

taskset -pc <PID>

LawrenceLiu023 · 2024-06-25T02:06:09Z

Thank you for your response. I have carefully observed the CPU usage during my testing and found that when Bismark calls the bowtie2 genome alignment engine for methylation sequencing data alignment, it uses 2 bowtie2 instances simultaneously. When I set the -p parameter of bowtie2 to a number n, I can observe that the number of bowtie2 processes is indeed 2n, and the CPU utilization is also 2n*100%.

However, in my tests on both a 24-core server and a 32-core server, when the -p parameter is set to n>4, the speed does not increase significantly, and even starts to decrease.

I would like to ask, what -p parameter do you typically use for bowtie2? Have you also observed a clear performance improvement as you increase the -p parameter?

I also found a post that mentioned a similar performance issue, where the -p parameter seems to not bring much performance improvement after a certain value. The link is: https://www.biostars.org/p/92366/.

sfiligoi · 2024-06-25T02:34:12Z

I saw good scalability to -p 16.
Note your CPU has only 12 CPU cores (x2 HT), so that could explain why the time grows from that point on.
Bowtie2 is also (mostly) memory bound, so scalability is know to be limited more by memory bandwidth than compute core TOPS.

That said, I see you are using an ancient version of bowtie2 (2.4.4).
There have been significant memory access improvements in 2.5.0, which should help in your case.
I would recommend you try the latest version.

PS: I will try to run a few benchmarks on my system and post the detailed results.

LawrenceLiu023 · 2024-06-25T05:32:17Z

Thank you for the additional information. I suspect that hyper-threading could indeed be a factor contributing to the performance plateau.

I installed bowtie2 using apt-get install bowtie2, so the version I have installed is the outdated 2.4.4 release. I will install the latest version and run some more benchmarks.

sfiligoi · 2024-06-25T17:59:02Z

Here are a few data points for my 5M reads run using WoLr1 as the reference database:

NTHREADS Runtime
 2      13:25 mins
 4       6:47 mins
 8       3:29 mins
12       2:23 mins
16       2:08 mins

My CPU is
AMD EPYC 7302 16-Core Processor

The scaling does slow down close to the max, but it is almost linear up to -p 12 .

For completeness, bowtie2 v 2.5 and the command used is

$ /bin/time taskset -c 0-15 ./bowtie2 --no-exact-upfront --no-1mm-upfront -p${NTHREADS} -x /scratch/qp-woltka/WoLr1/WoLr1 -q ${INFILE} -S ${OUTFILE} --seed 42 --very-sensitive -k 16 --np 1 --mp "1,1" --rdg "0,1" --rfg "0,1" --score-min  "L,0,-0.05" --no-head --no-unal

4949790 reads; of these:
  4949790 (100.00%) were unpaired; of these:
    1493707 (30.18%) aligned 0 times
    1852338 (37.42%) aligned exactly 1 time
    1603745 (32.40%) aligned >1 times
69.82% overall alignment rate

LawrenceLiu023 · 2024-07-03T05:44:03Z

I ran the same test with v2.5.5 version of Bowtie2. The results show a significant increase in speed and a decrease in memory usage. The results are shown in the following plot. Considering that alignment of methylation sequencing data with Bismark starts two Bowtie2 alignment instances simultaneously, it is understandable that the speed stops increasing when p is over 6. I believe the best solution is to conduct more test on normal sequencing data with only Bowtie2 alone rather than methylation sequencing data. After completing more tests, I will provide an update here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alignment performance decreases with higher `-p` option value #478

Alignment performance decreases with higher `-p` option value #478

LawrenceLiu023 commented May 23, 2024

sfiligoi commented Jun 24, 2024

LawrenceLiu023 commented Jun 25, 2024

sfiligoi commented Jun 25, 2024 •

edited

Loading

LawrenceLiu023 commented Jun 25, 2024

sfiligoi commented Jun 25, 2024

LawrenceLiu023 commented Jul 3, 2024

Alignment performance decreases with higher -p option value #478

Alignment performance decreases with higher -p option value #478

Comments

LawrenceLiu023 commented May 23, 2024

sfiligoi commented Jun 24, 2024

LawrenceLiu023 commented Jun 25, 2024

sfiligoi commented Jun 25, 2024 • edited Loading

LawrenceLiu023 commented Jun 25, 2024

sfiligoi commented Jun 25, 2024

LawrenceLiu023 commented Jul 3, 2024

Alignment performance decreases with higher `-p` option value #478

Alignment performance decreases with higher `-p` option value #478

sfiligoi commented Jun 25, 2024 •

edited

Loading