Java heap space #14

karlkashofer · 2022-02-08T08:53:47Z

Running umicollapse on 200mio paired end reads (400 reads total) runs out of Java heap space even with -Xmx96G.
Is that normal ?

Daniel-Liu-c0deb0t · 2022-02-10T22:04:59Z

It should not fail with only 400 reads. Have you tried setting -Xms to a larger value? That is the initial heap size. What is the exact command you are running? Paired-end mode takes up more memory, but it shouldn't run out of memory for only 400 reads.

karlkashofer · 2022-04-03T12:00:19Z

Sorry, i meant 200mio paired end reads which is 400mio reads total.

Daniel-Liu-c0deb0t · 2022-04-04T15:18:20Z

If you are using paired-end mode (--paired), it takes a lot of memory. This is because it has to make sure pairs of reads stay together during the deduplication process. This involves storing a lot of reads in memory. Potential workarounds could be splitting the 200 million paired end reads into smaller files and deduplicating them, or not using paired-end mode (but then there might exist pairs of reads where only one read of the pair is removed).

karlkashofer · 2022-04-05T05:58:26Z

Yes, i use --paired as this is Illumina NovaSeq data from Agilent XT libraries (dual index and dual UMI).
I dont really understand why --paired need so much memory. In your paper you state "the reads at each unique alignment location are independently deduplicated based on the UMI sequences. ", so i understand it only needs to keep all reads at a single position within memory. I deduplicate WGS data, there is hardly a position with more than 100 reads, so i really dont understand why it would require > 80GB of memory.

Thanks for your work btw ! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Java heap space #14

Java heap space #14

karlkashofer commented Feb 8, 2022

Daniel-Liu-c0deb0t commented Feb 10, 2022

karlkashofer commented Apr 3, 2022

Daniel-Liu-c0deb0t commented Apr 4, 2022

karlkashofer commented Apr 5, 2022 •

edited

Loading

Java heap space #14

Java heap space #14

Comments

karlkashofer commented Feb 8, 2022

Daniel-Liu-c0deb0t commented Feb 10, 2022

karlkashofer commented Apr 3, 2022

Daniel-Liu-c0deb0t commented Apr 4, 2022

karlkashofer commented Apr 5, 2022 • edited Loading

karlkashofer commented Apr 5, 2022 •

edited

Loading