You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! I have created a custom reference database for my ecosystem (18S V4) to assign taxonomic annotations using assignTaxonomy().
I just used the database on a small test dataset and the output looks good, but I noticed something unexpected. The most abundant ASV is assigned to the correct genus, but a different species that I would expect. I BLASTed the ASV sequence and get a 100% match to Chloromonas brevispina, which happens to be the first sequence in the custom database that is formatted like this (first two sequences below):
However, the annotation of the top ASV is to Chloromonas hindakii, with a confidence of .63. It's BLAST hit is only 99.4% similar to the ASV.
I don't understand why a 100% match would be passed over during the assignTaxonomy() step. Any thoughts?
The Chloromonas hindakii is a longer reference sequence than Chloromonas brevispina by maybe 1000 bps. Do I have to trim my database to the region of my amplicon? There are only about 5000 sequences in my custom reference database. Thanks!
The text was updated successfully, but these errors were encountered:
For short-read species assignment, using assignTaxonomy is not what we recommend. The RDP classifier algorithm that assignTaxonomy implements was never really intended to perfectly discriminate between reference sequences differering by just a single-nucleotide. There is an alternative function called assignSpecies which uses exact unambiguous matching to assign at the species level that presumably would work correctly here. Also, the default confidence threshold of 0.5 is tuned for genus-level assignment on 16S data -- for species-level with such closely related references a higher threshold may be more appropriate.
Hello! I have created a custom reference database for my ecosystem (18S V4) to assign taxonomic annotations using assignTaxonomy().
I just used the database on a small test dataset and the output looks good, but I noticed something unexpected. The most abundant ASV is assigned to the correct genus, but a different species that I would expect. I BLASTed the ASV sequence and get a 100% match to Chloromonas brevispina, which happens to be the first sequence in the custom database that is formatted like this (first two sequences below):
However, the annotation of the top ASV is to Chloromonas hindakii, with a confidence of .63. It's BLAST hit is only 99.4% similar to the ASV.
I don't understand why a 100% match would be passed over during the assignTaxonomy() step. Any thoughts?
The Chloromonas hindakii is a longer reference sequence than Chloromonas brevispina by maybe 1000 bps. Do I have to trim my database to the region of my amplicon? There are only about 5000 sequences in my custom reference database. Thanks!
The text was updated successfully, but these errors were encountered: