-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low sum in ragoo.fasta #24
Comments
Hi Lauren, Is the missing sequence present in unplaced contigs? They should be concatenated at the end of the |
Hello, The sums above include all of the entries in the
In terms of file size, the original scaffolds file is 3.0GB, but the |
hmm so something definitely seems off. The first thing to check would be to ensure that every contig has been placed in an orderings files. Can you do Then, Those two numbers should be the same, meaning that every contig in the broken assembly has been placed in an orderings file. |
Looks like those numbers don't quite match:
|
oh wow that is a ton of contigs! How many contigs are in your original assembly? And can you tell me what species you are assembling? |
This is a human assembly, and there are 1,432,518 contigs originally. Perhaps Ragoo is better suited to work with assemblies with fewer pieces? |
Perhaps an easy work-around for me would be to just see what contigs are in the chimera_break fasta and NOT in an orderings txt file, and add those guys into my |
Well, in theory, it should work even for an assembly with so many contigs. As a test, perhaps you can run without Really, a bigger concern is that I assume you have a bunch of really small contigs which may not get placed. In fact, any contigs under 10k won't even be considered and will automatically be unplaced. |
Can I ask what your N50 is? |
Ok good to know about the unplaced sequence. |
ok that is good to know. well if you are willing to share the data then I can probably debug pretty fast. Otherwise, I will have to think of some other tests to run. |
Unfortunately I'm working with confidential data, so I can't share the assemblies with you. I think I have a decent workaround for now -- if I add in all the sequences that were not listed in an ordering file, then the sum is closer to the expected. Thank you for your help! |
Ok sounds good. Really, the way it is designed, the |
When you add in the contigs manually, what percentage of sequence is localized to chromosomes? And which reference are you using? |
My 'reference' is another human assembly using a different assembler. The original |
Hi there, After testing the code with your data, I believe I understand the problem. When Indeed, if one does not use In future versions of RaGOO, the intermediate output will be restricted to exactly 2 files regardless of the Additionally, it is true that RaGOO was not designed for more fragmented assemblies of larger genomes. To address this, future versions of ragoo will allow the user to lower the minimum alignment length, thus allowing for more contigs to be placed. I will test out your data again when these features are implemented. Thanks |
Hello,
I am attempting to run Ragoo using a long-read assembly as the 'reference'.
After running Ragoo with the following command:
My output
ragoo.fasta
file seems to be missing a lot of bases. The original assembly is ~2.7Gb, but the output fasta file has ~736 Mb only.Any idea about what is happening to the outstanding sequences, or is this expected behaviour? The
chimera.broken.fa
file is the correct size, so it seems that things are being lost after that stage somewhere.Thanks!
Lauren
The text was updated successfully, but these errors were encountered: