Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue converting fast5 -> pod5 with multiple threads #146

Open
arturotorreso opened this issue Oct 7, 2024 · 5 comments
Open

Comments

@arturotorreso
Copy link

I am running pod5 convert fast5 on a sample with about 5000 fast5 files (from the sample samples, 4000 reads each), writing to a single pod5 per sample.

  • When I run it on the 5000 files with -t 1, I get a performance of ~800 reads/s.
  • When I run it on the 5000 files with -t 2, the performance goes down dramatically to ~200-300 reads/s, and the jobs keep getting into D state. Increasing threads do not improve the performance, and it sometimes goes down to 50 reads/s

So I made subsets of the reads and compared the performance:

  • When I run it on 200 files with -t 1, I get a performance of ~800 reads/s. If I increase the threads, the reads/s keep increasing as expected. As I increased the number of files the gains of multithreading decrease. For my system I found the sweet spot in 300 files.

I thought this could be a bottleneck due to writing to the same file, but If I run two samples in the background simultaneously (thus writing to two different pod5 files) I run into the same situation of decreasing performance (similar to when using multiple threads on loads of files), and the jobs keep getting send to state D. My system should have enough memory to handle the job though.

For now I'm thinking of processing the files in batches and merging the final pod5, but I was curious to know if this is a known issue and what recommendations you have to improve performance when running multiple samples at the same time or with multiple threads.

@0x55555555
Copy link
Collaborator

Hi @arturotorreso,

Just to confirm - you tested writing two files simultaneously, and confirmed it wasn't a bottleneck of writing to one file.

but you did see increased performance when running the conversion on batches of smaller files (300 being optimal)?

Can you confirm the only difference between the two tests where performance was different was number of files input to the converson script?

Can you also let me know the approximate length of the reads in the files?

Can you provide an example command line snippet you are using to trigger the conversion?

Thanks,

  • George

@arturotorreso
Copy link
Author

arturotorreso commented Oct 8, 2024

Thank you for your quick response!

Just to confirm - you tested writing two files simultaneously, and confirmed it wasn't a bottleneck of writing to one file.

Yes, there was also a decreased performance when running multiple samples simultaneously and running to separate files. This was also dependent on the number of input files in each sample. If I run both samples with 300 input fast5 each, the decreased performance wasn't too bad (2000-3000 reads/s each with -t 4, versus 7000 reads/s if run separately). But if each sample was run with 5000 files, then the performance decreased to 40-50 reads/s. This does point out to a memory issue but in theory I should have enough CPUs and Mem to handle it.

but you did see increased performance when running the conversion on batches of smaller files (300 being optimal)?

Yes

Can you confirm the only difference between the two tests where performance was different was number of files input to the conversion script?

Yes

Can you also let me know the approximate length of the reads in the files?

We are working mostly with cell free DNA (~200bp), but we also find larger DNA fragments (>10kb). The read length distribution will be 216bp (157-776), but the range goes up to 37kb.

Can you provide an example command line snippet you are using to trigger the conversion?

I'm running it straight from the command line:

pod5 convert fast5 -f -t 4 -o output.pod5 input_folder/

And for subsets:

pod5 convert fast5 -f -t 4 -o output.pod5 $(ls input_folder/*.fast5 | head -n200)

Let me know if you need anything else!

@arturotorreso
Copy link
Author

arturotorreso commented Oct 8, 2024

Similarly, when I run the command ls *fast5 | xargs -n200 pod5 convert fast5 -f -t 20 -o pod5_out/$RANDOM.pod5 I get a decreased in performance in the second batch, as shown in the picture. Could it be an issue of python multiprocessing not cleaning up after finishing?

Screenshot 2024-10-08 125357

@0x55555555 0x55555555 changed the title Performance issue with multiple threads Performance issue converting fast5 -> pod5 with multiple threads Oct 9, 2024
@0x55555555
Copy link
Collaborator

Do you see the decrease in performance if you run the commands sequentially manually, or with a small gap?

If you restart the terminal session and re run the experiment is it faster, or wait for a period after the run?

I'll attempt to reproduce your results here.

  • George

@arturotorreso
Copy link
Author

If I run them manually, there's no decrease performance. With gaps, yes, I put a sleep of 1 minute and still saw the performance decrease.

I don't need to restart the terminal, as soon as I kill the job and restart it, it goes faster until it eventually decreases performance again.

Right now I'm running each file separately in a loop with -t 1, and merging afterwards, and it performs well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants