Skip to content

Latest commit

 

History

History
173 lines (164 loc) · 4.96 KB

Metaxa2_blast_megablast.md

File metadata and controls

173 lines (164 loc) · 4.96 KB

Results comparison of Metaxa2 using blastn and megablast

Metaxa2 v2.1.1 is a bioiformatic tool designed to assign 16S rRNA sequences from a metagenomic dataset to an archaeal, bacterial, nuclear eukaryote, mitochondrial or chloroplast origin (1). Metaxa2, detect the 16S rRNA candidate sequences from the reads universe using hidden Markov models built from the SILVA database. Ribosomal sequences are compared against the Metaxa2 database by blast and the method takes into account the five best hits to assign the taxonomic identity per sequence. In the case of ambiguity (reliability score < 80), the algorithm align the five best hits using MAFFT and recalculates the reliability score for the next taxonomic level in the lineage until the score be > 80. The method use blastn by default, and have the option of running megablast. We tested the method with the V3V4 lib1 dataset available in the datasets_16SrRNA directory of this repo, using both options.

Metaxa2-mtx blastn Metaxa2-mtx megablast
Taxonomic level Sens Spec ACC MCC Sens Spec ACC MCC
domain 1.000000 0.999840 0.999992 0.999916 1.000000 0.999867 0.999994 0.999930
phylum 1.000000 0.999973 0.999999 0.999986 1.000000 0.999973 0.999999 0.999986
class 1.000000 0.904833 0.994992 0.948723 1.000000 0.904833 0.994992 0.948723
order 0.969362 0.829477 0.961332 0.699140 0.969362 0.829477 0.961332 0.699140
family 0.967010 0.382080 0.894109 0.433813 0.967010 0.382070 0.894108 0.433803
genus 0.909425 0.602545 0.885172 0.409323 0.909425 0.602529 0.885171 0.409312
species 0.199642 0.795149 0.235305 -0.003090 0.199642 0.795127 0.235304 -0.003103
subspecies 0.111160 0.978576 0.153370 0.062514 0.111160 0.978576 0.153370 0.062514

We observed that for this datasets, performance statistical descriptors was almost identical indicating that blast parameters does not represent a substantial difference in the sensitivity and/or specificity of the Metaxa2 taxonomic assignments.

We runned both programs in the server of Biotechnology Institute (UNAM) splitting the dataset into chunks of 1,000 sequences. The jobs were performed over each chunk using 4 threads with 7G each. In the following table are the stats of average and standard-deviation of time and memory spent in each case.

Metaxa2 alignment algorithm wall clock (s) ru_utime ru_stime CPU mem io maxvmem(G)
Blastn (average) 817.833 3169.473 16.812 3191.073 2477.961 0.096 2.456
Blastn (SD) 35.304 116.549 0.692 116.465 90.759 0.009 0.239
Megablast (average) 824.895 3166.722 16.654 3188.276 2475.763 0.097 2.475
Megablast (SD) 33.459 114.924 1.017 115.157 89.731 0.009 0.221

Again, we got highly similar stats in performance for both BLAST-algorithms in terms of memory and time evaluating the method in minisets of 1,000 amplicon sequences. It means that at least for this kind of data, divide the main dataset is a good strategy to improve the use efficiency of computational resources. You will find the chunkMaker.pl script in the bin directory of this repo.

  1. Bengtsson-Palme, J. et al. METAXA2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Mol. Ecol. Resour. 15, 1403–1414 (2015).