You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Mike,
Recently, I ran cenote-taker2 and blastn against nt database & diamond against nr database with the contigs assembled by Megahit. I found that about 10000 sequences were classified as viruses, while about 1000 were identified by blast. I am confused about why the results from blast are ten times less than cenote-taker2.
As you pointed that "Many virus genomes are integrated into host chromosomes" and "viral genes and genomes are often misidentified as host sequences"(Tisza M J, Belford A K, Dominguez-Huerta G, et al. Cenote-Taker 2 democratizes virus discovery and sequence annotation[J]. Virus evolution, 2021, 7(1): veaa100.). Thus, blast may have some false-negatives results. So, Is there a threshold to classify sequences as viral or non-viral using both tools (e.g. blast p-value or percent of ident or mapping length)?
wish you a merry Christmas in advance!
Nailou Zhang
The text was updated successfully, but these errors were encountered:
Thanks for your comment. It's a bit complicated to assess this without more information about how Cenote-Taker 2 was run and what settings you used with blast and diamond.
Using blastn against nt could be a great way to look for viruses present in this database and their close relatives, however, the vast majority of the viruses that exist on earth are not catalogued in nt. Recent estimates suggest that there are around 1 billion virus species on earth. The number of virus species in nt is in the tens of thousands.
Of course, as a general statement, Cenote-Taker 2 will return false positives at some unknown rate. If you are querying contigs assembled from WGS reads and you use -db virion --lin_minimum_hallmark_genes 2 --circ_minimum_hallmark_genes 2, I would estimate the false positive rate is only about ~1%, maybe less. It's hard to measure this meaningfully, in my opinion.
Hi Mike,
Recently, I ran cenote-taker2 and blastn against nt database & diamond against nr database with the contigs assembled by Megahit. I found that about 10000 sequences were classified as viruses, while about 1000 were identified by blast. I am confused about why the results from blast are ten times less than cenote-taker2.
As you pointed that "Many virus genomes are integrated into host chromosomes" and "viral genes and genomes are often misidentified as host sequences"(Tisza M J, Belford A K, Dominguez-Huerta G, et al. Cenote-Taker 2 democratizes virus discovery and sequence annotation[J]. Virus evolution, 2021, 7(1): veaa100.). Thus, blast may have some false-negatives results. So, Is there a threshold to classify sequences as viral or non-viral using both tools (e.g. blast p-value or percent of ident or mapping length)?
wish you a merry Christmas in advance!
Nailou Zhang
The text was updated successfully, but these errors were encountered: