misclassification of Melissococcus plutonius #2

duancopeland · 2022-09-12T20:46:36Z

Hello and thanks for this tool. Recently I applied the BEExact database to a dataset from an EFB outbreak. The number one OTU came up as:

Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100);Lactobacillales_unclassified(100);Lactobacillales_unclassified(100);Lactobacillales_unclassified(100);

When I BLAST this sequence in the database it indeed comes up as Melissococcus plutonius

Not sure why this hit would come up under Bacilli, but I hope this helps to improve this awesome tool.

bdaisley · 2022-09-13T02:05:42Z

Hi @wapoga

Thanks for bringing this up. Melissococcus plutonius is a member of Bacilli at the 'class' rank, and also Lactobacillales at the 'order' rank, so this would appear correct for the higher rank levels of the original taxonomy string you got. But I agree, it seems quite odd that it came back as unclassified at the family level and lower ranks.

I did a quick validation check with the 'IDTAXA___BEEx-V3V4-TS.RData' classifier database (v2021.0.2) using a 16S rRNA gene sequence from a confirmed Melissococcus plutonius isolate (NCBI ref NR_074098.1 - https://www.ncbi.nlm.nih.gov/nuccore/NR_074098.1) and was able to get the correct species-level identification as follows:

Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100);Enterococcaceae(98.90);Melissococcus(98.90);Melissococcus_plutonius(98.90)

If you don't mind, could you please provide the sequence you were trying to classify (copy and paste here or email is fine) as well as the the following details so that I may reproduce the result you got and address this issue more thoroughly:

Which BEExact database did you use? (e.g. V4, V3V4, V5V6, etc)
Which classifier software did you use and were any of the settings different than default? (e.g. IDTAXA, QIIME2, others)
Did you use clustering methods (97-99% OTUs) or denoising algorithms (100% ASVs/SVs) to determine your sequence reads?

duancopeland · 2022-09-14T18:27:19Z

i'll get these to you in the next day

duancopeland · 2022-09-15T17:44:02Z

Maybe there's an explanation for this in our protocol:
We used mothur for the whole analysis
using mothur I took the dadaBEEx-FL-TS.fa and trimmed it to v3v4 region, then aligned our sequences to this
Then I classified the seqs with the BEEx_FL_bxid_refs_sequences.fasta reference
In mothur I used the make.shared command to get unique seqs (ASVs)

Here is the representative sequence for the Melissococcus that is coming up as:
Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100);Lactobacillales_unclassified(100);Lactobacillales_unclassified(100);Lactobacillales_unclassified(100);

AGGGAATCTTCGGCAATGGACGAAAGTCTGACCGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAACTCTGTTGTTAGAGAAGAATAGGGGAAAGAGTAACTGTTTTCCTCGTGACGGTATCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTTTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGGGGAGGGTCATTGGAAACTGGAAGACTTGAGTGCAGAAGAGGAGAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGTAACTGACGCTGAGGCTCGAAAGCGTGGGGAGCAAACA

bdaisley · 2022-09-16T22:03:58Z

Okay, so I believe I figured out what was going wrong here. Replicating the steps you indicated, I ran the align.seqs command in mothur and found that the misclassified sequence aligned with 100% identity to BX004500;Bacteria;Firmicutes;Bacilli;Lactobacillales;Enterococcaceae;Melissococcus;Melissococcus_plutonius.

QueryName	QueryLength	TemplateName	TemplateLength	SearchMethod	SearchScore	AlignmentMethod	QueryStart	QueryEnd	TemplateStart	TemplateEnd	PairwiseAlignmentLength	GapsInQuery	GapsInTemplate	LongestInsert	SimBtwnQuery&Template
misclassified_seq	426	BX004500	253	kmer	58.23	needleman	176	426	1	251	251	0	0	0	100.00

However, when I ran the classify.seqs command in mothur it was misclassified the same as you mentioned:

>misclassified_seq
AGGGAATCTTCGGCAATGGACGAAAGTCTGACCGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAACTCTGTTGTTAGAGAAGAATAGGGGAAAGAGTAACTGTTTTCCTCGTGACGGTATCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGCGCAGGCGGTTTTTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGGGGAGGGTCATTGGAAACTGGAAGACTTGAGTGCAGAAGAGGAGAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATATGGAGGAACACCAGTGGCGAAGGCGGCTCTCTGGTCTGTAACTGACGCTGAGGCTCGAAAGCGTGGGGAGCAAACA

mothur > classify.seqs(fasta=misclassified_seq.fasta, reference=BEEx-FL-bxid-refs_sequences.fa, taxonomy=BEEx-FL-bxid-refs_taxonomy.txt)

Result:
misclassified_seq=Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100);Lactobacillales_unclassified(100);Lactobacillales_unclassified(100);Lactobacillales_unclassified(100);

To see if this could be a formating issue with the RDP classifier implemented in mothur, I tested the misclassified_seq using another RDP classifier algorithm implemented in the dada2 R package. The sequence was classified correctly:

library(dada2)
library(data.table)

query <- readDNAStringSet("misclassified_seq.fasta", format="fasta")
database <- "dada2___BEEx-FL-TS.fa.gz"

dada2::assignTaxonomy(query, database, multithread=FALSE, outputBootstraps=TRUE)
Result:
misclassified_seq=Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100);Enterococcaceae(81);Melissococcus(72);Melissococcus_plutonius(72);

Based on this, I constructed a single seed subset reference databse from the full-length BEExact database for use with mothur. The respective files are available here:
BEEx-FL-seeds_sequences.fasta
BEEx-FL-seeds_taxonomy.txt

To format the taxa headers and make a mothur-specific classification databse, I ran the following:

awk 'FNR==NR{
  a[">"$1]=$2;next
}
$1 in a{
  sub(/>/, $1 "\t" "Root;" a[$1])
}1' BEEx-FL-seeds_taxonomy.txt BEEx-FL-seeds_sequences.fasta > mothur___BEEx-FL-seeds_sequences.fasta

I've uploaded this formatted database to the classifier databases directory, available here:
mothur___BEEx-FL-seeds_sequences.fasta

To confirm these changes solved the issue, I ran the following in mothur:


mothur > classify.seqs(fasta=misclassified_seq.fasta, reference=mothur___BEEx-FL-seeds_sequences.fasta, taxonomy=BEEx-FL-seeds_taxonomy.txt)

Result:
misclassified_seq=Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100);Enterococcaceae(100);Melissococcus(98);Melissococcus_plutonius(98);

Everything looks to be in order now. I would suggest re-classifying the rest of your sequences using the mothur-formatted database, as it is likely other taxa were also incorrectly classified.

Let me know if this solves the problem on your end.

***Additional info: For clarity to others who may read this, before running align.seqs, I first aligned the full-length BEEx-FL-TS.fa database to the 50,000-character global SILVA alignment (via SINA v1.2.11), and then trimmed the alignment file to the the V3-V4 region (i.e. positions 13862 to 23444) using the pcr.seqs command in mothur. I've uploaded the 50K alginment file to the BEExact seeds directory available here in case it may be helpful to others.

duancopeland · 2022-09-16T22:33:48Z

Thanks Brendan, you are awesome!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

misclassification of Melissococcus plutonius #2

misclassification of Melissococcus plutonius #2

duancopeland commented Sep 12, 2022

bdaisley commented Sep 13, 2022

duancopeland commented Sep 14, 2022

duancopeland commented Sep 15, 2022

bdaisley commented Sep 16, 2022

duancopeland commented Sep 16, 2022

misclassification of Melissococcus plutonius #2

misclassification of Melissococcus plutonius #2

Comments

duancopeland commented Sep 12, 2022

bdaisley commented Sep 13, 2022

duancopeland commented Sep 14, 2022

duancopeland commented Sep 15, 2022

bdaisley commented Sep 16, 2022

duancopeland commented Sep 16, 2022