Results comparison of Metaxa2 using blastn and megablast

Metaxa2 v2.1.1 is a bioiformatic tool designed to assign 16S rRNA sequences from a metagenomic dataset to an archaeal, bacterial, nuclear eukaryote, mitochondrial or chloroplast origin (1). Metaxa2, detect the 16S rRNA candidate sequences from the reads universe using hidden Markov models built from the SILVA database. Ribosomal sequences are compared against the Metaxa2 database by blast and the method takes into account the five best hits to assign the taxonomic identity per sequence. In the case of ambiguity (reliability score < 80), the algorithm align the five best hits using MAFFT and recalculates the reliability score for the next taxonomic level in the lineage until the score be > 80. The method use blastn by default, and have the option of running megablast. We tested the method with the V3V4 lib1 dataset available in the datasets_16SrRNA directory of this repo, using both options.

	Metaxa2-mtx blastn				Metaxa2-mtx megablast
Taxonomic level	Sens	Spec	ACC	MCC	Sens	Spec	ACC	MCC
domain	1.000000	0.999840	0.999992	0.999916	1.000000	0.999867	0.999994	0.999930
phylum	1.000000	0.999973	0.999999	0.999986	1.000000	0.999973	0.999999	0.999986
class	1.000000	0.904833	0.994992	0.948723	1.000000	0.904833	0.994992	0.948723
order	0.969362	0.829477	0.961332	0.699140	0.969362	0.829477	0.961332	0.699140
family	0.967010	0.382080	0.894109	0.433813	0.967010	0.382070	0.894108	0.433803
genus	0.909425	0.602545	0.885172	0.409323	0.909425	0.602529	0.885171	0.409312
species	0.199642	0.795149	0.235305	-0.003090	0.199642	0.795127	0.235304	-0.003103
subspecies	0.111160	0.978576	0.153370	0.062514	0.111160	0.978576	0.153370	0.062514

We observed that for this datasets, performance statistical descriptors was almost identical indicating that blast parameters does not represent a substantial difference in the sensitivity and/or specificity of the Metaxa2 taxonomic assignments.

We runned both programs in the server of Biotechnology Institute (UNAM) splitting the dataset into chunks of 1,000 sequences. The jobs were performed over each chunk using 4 threads with 7G each. In the following table are the stats of average and standard-deviation of time and memory spent in each case.

Metaxa2 alignment algorithm	wall clock (s)	ru_utime	ru_stime	CPU	mem	io	maxvmem(G)
Blastn (average)	817.833	3169.473	16.812	3191.073	2477.961	0.096	2.456
Blastn (SD)	35.304	116.549	0.692	116.465	90.759	0.009	0.239
Megablast (average)	824.895	3166.722	16.654	3188.276	2475.763	0.097	2.475
Megablast (SD)	33.459	114.924	1.017	115.157	89.731	0.009	0.221

Again, we got highly similar stats in performance for both BLAST-algorithms in terms of memory and time evaluating the method in minisets of 1,000 amplicon sequences. It means that at least for this kind of data, divide the main dataset is a good strategy to improve the use efficiency of computational resources. You will find the chunkMaker.pl script in the bin directory of this repo.

Bengtsson-Palme, J. et al. METAXA2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Mol. Ecol. Resour. 15, 1403–1414 (2015).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metaxa2_blast_megablast.md

Metaxa2_blast_megablast.md

Files

Metaxa2_blast_megablast.md

Latest commit

History

Metaxa2_blast_megablast.md

File metadata and controls