-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with incorrect virus classification using metabuli binning2report function and suggestion for ICTV taxonomy update #96
Comments
Thank you so much for the resources of ICTV !! Sounds very useful.
Let me check BIOM format and see if I can make a module for the conversion you want. |
Thank you so much for the tips !! Please use this Metabuli version. |
Thanks for the effort. Out of curiosity I just checked the latest ICTV taxonomy. They now put SARS-CoV-1 and SARS-CoV-2 all into this weird species name Betacoronavirus pandemicum !! Hope you guys dont follow ICTV taxonomy so soon as it is now so confusing... |
Thank you very much, Jaebeom. I hope you're having an excellent start to the week, and I appreciate you taking the time to read my comments and share your database. I apologize for the delayed response. On another note, I have downloaded the ICTV VMR39.2 database that you shared. Currently, I am using this version of Metabuli: Metabuli Version 1.0.8. I have a question: is the version I downloaded different from the one available in this directory? -> https://github.com/jaebeom-kim/Metabuli Another question: Should I download the nodes.dmp and names.dmp files from here > https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/ or do I need to build my own taxonomy files using TaxonKit and create an ICTV taxdump file? Thank you again for your help, and I wish you an excellent day! Best regards |
Hi! Your comment was very helpful!
Yes, please use
You don't need to download any dmp files to try the ICTV database. Thanks again! |
Hi Metabuli team,
First of all, I would like to thank you for such an excellent tool, I really enjoy using it. I’m currently using Metabuli to classify my viral metagenome sequences, and I have been using the viral database provided by Metabuli for my analyses. After running the classification, I obtain a report file that looks like this:
98.6127 39657793 39657793 no rank 0 unclassified
1.3873 557902 166 no rank 1 root
1.3787 554456 812 superkingdom 10239 Viruses
1.2988 522339 0 clade 2731341 Duplodnaviria
1.2988 522339 31 kingdom 2731360 Heunggongvirae
1.2938 520316 0 phylum 2731618 Uroviricota
1.2938 520316 14652 class 2731619 Caudoviricetes
0.5720 230022 229876 genus 2843396 Jouyvirus
0.0003 105 0 species 2844245 Jouyvirus ev017
0.0003 105 105 no rank 2847060 Escherichia phage ev017
After generating this report, I attempt to convert it into a Kraken-style format using the metabuli binning2report function. However, during this conversion, I encounter an issue where the output file no longer includes virus classifications but instead focuses solely on bacteria. The output looks like this:
30.75 1051 1051 no rank 0 unclassified
54.13 1850 485 no rank 1 root
39.88 1363 0 no rank 131567 cellular organisms
39.47 1349 230 superkingdom 2 Bacteria
17.50 598 0 phylum 1224 Pseudomonadota
7.72 264 0 class 28211 Alphaproteobacteria
5.47 187 0 order 356 Hyphomicrobiales
4.27 146 0 family 335928 Xanthobacteraceae
4.04 138 67 genus 6 Azorhizobium
2.08 71 71 species 7 Azorhizobium caulinodans
Why does this issue occur? My goal is to convert my report to Kraken format so that I can eventually transform the files into a BIOM (Biological Observation Matrix) format. This would allow me to combine all my reports into a single file, and then I could use the R phyloseq package to generate various statistics from my samples.
Additionally, I noticed there have been previous requests regarding updating the taxonomy to align with ICTV. I came across two resources that might be helpful:
This one explains how to construct NCBI-style taxdump files for the International Committee on Taxonomy of Viruses (ICTV):
https://github.com/shenwei356/ictv-taxdump
This other resource provides a tutorial on how to build a protein FASTA database for ICTV (though it is adapted for MMseqs2, it might help in building an ICTV viral database for Metabuli):
https://github.com/apcamargo/ictv-mmseqs2-protein-database/blob/master/README.md
Thank you for your time and attention. I hope you have a great start to the week!
Best regards,
The text was updated successfully, but these errors were encountered: