Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while running umicollapse #29

Open
ancient-learner opened this issue Aug 7, 2024 · 4 comments
Open

Error while running umicollapse #29

ancient-learner opened this issue Aug 7, 2024 · 4 comments

Comments

@ancient-learner
Copy link

Hi
Thank you for this tool. I have a file ~80M reads (Single-end). I have good resources in terms of RAM and processors. But every time I run, I get the below error. Not sure what I am doing wrong

umicollapse bam -i ../SRR27822176_STAR_Aligned.sortedByCoord.out.bam -o ../dedup1_example.bam -Xms16G -Xmx1000G -Xss8M Arguments [bam, -i, ../SRR27822176_STAR_Aligned.sortedByCoord.out.bam, -o, ../dedup1_example.bam, -Xss8M] Done reading input file into memory! Exception in thread "main" java.lang.StackOverflowError at java.base/java.util.HashMap.putVal(HashMap.java:627) at java.base/java.util.HashMap.put(HashMap.java:610) at java.base/java.util.HashSet.add(HashSet.java:221) at umicollapse.data.NgramBKTree.recursiveRemoveNearBKTree(NgramBKTree.java:70) at umicollapse.data.NgramBKTree.recursiveRemoveNearBKTree(NgramBKTree.java:85) at umicollapse.data.NgramBKTree.recursiveRemoveNearBKTree(NgramBKTree.java:85) at umicollapse.data.NgramBKTree.recursiveRemoveNearBKTree(NgramBKTree.java:85) at umicollapse.data.NgramBKTree.recursiveRemoveNearBKTree(NgramBKTree.java:85) at umicollapse.data.NgramBKTree.recursiveRemoveNearBKTree(NgramBKTree.java:85) at umicollapse.data.NgramBKTree.recursiveRemoveNearBKTree(NgramBKTree.java:85) at umicollapse.data.NgramBKTree.removeNear(NgramBKTree.java:45) at umicollapse.algo.Directional.visitAndRemove(Directional.java:46) at umicollapse.algo.Directional.visitAndRemove(Directional.java:53) at umicollapse.algo.Directional.visitAndRemove(Directional.java:53) at umicollapse.algo.Directional.visitAndRemove(Directional.java:53) at umicollapse.algo.Directional.visitAndRemove(Directional.java:53) at umicollapse.algo.Directional.visitAndRemove(Directional.java:53)

@Daniel-Liu-c0deb0t
Copy link
Owner

Try running it with a larger stack size! Edit the umicollapse script and use -Xss1G. You can also try increasing the heap sizes to be safe. More info in the readme: https://github.com/Daniel-Liu-c0deb0t/UMICollapse?tab=readme-ov-file#java-virtual-machine-memory

@ancient-learner
Copy link
Author

I am getting same error even if I used umicollapse bam -i ../SRR27822176_STAR_Aligned.sortedByCoord.out.bam -o ../dedup1_example.bam -Xms100G -Xmx1200G -Xss20G --two-pass I used UMItools to extract UMI and used STAR to align and sort the bam file based on coordinates.

@ancient-learner
Copy link
Author

Sorry for shooting you up with many questions. I used the fastq file first to extract UMIs using umi_tools. Then I have trimmed the adapters using cutadapt. The UMI extracted, adapter trimmed fastq file is used as an input for deduplication (Note: previously I was using the bam file after alignment). Here is what I got java -Xms16G -Xmx400G -Xss1G -jar umicollapse.jar fastq -i ../SRR27822176_trim.umi.R2.fastq.gz -o SRR27822176_trim --umi-sep _ --two-pass Arguments [fastq, -i, ../SRR27822176_trim.umi.R2.fastq.gz, -o, SRR27822176_trim, --umi-sep, _, --two-pass] Done reading input file into memory! Number of input reads 50962854 Number of unique reads 7257732 Number of reads after deduplicating 6372515 UMI collapsing finished in 130.721 seconds!
My question is will the output differ If I had used the aligned .bam files? If so to what extent?

@Daniel-Liu-c0deb0t
Copy link
Owner

Daniel-Liu-c0deb0t commented Aug 9, 2024

The difference is that you need to use the java ... -jar way of running the tool (like your fastq run) for the BAM file. Alternatively, you need to directly modify the umicollapse file. The options for stack and heap sizes are not being passed in the right location.

The output will differ between bam and fastq modes because fastq mode tolerates mismatches only in the sequences and their UMIs when clustering, while bam mode uses the alignment coordinates (so this can tolerate indels in alignment and untrimmed adapter bases, etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants