Use Plass for euk metagenomics data #28

liuxianghui · 2020-07-15T09:40:39Z

I want to extract euk genes/proteins from metagenomics data. I want to build a gene/protein catalog for euk genes.
Seems that metaeuk is a reference guided approach ( based on mmseq2) and Plass is a denova approach ( not relying on reference protein sequences).
I don't understand the statement in your paper about Plass on euk protein assembly.
"Our chief limitation is that, unlike nucleotide assemblers, Plass cannot place the assembled protein sequences into genomic context. Furthermore, it cannot assemble intron-containing eukaryotic proteins, although, as shown, it can assemble eukaryotic proteins from transcriptome data. Another drawback is its inability to resolve homologous proteins from closely related strains or species with sequence identities above ~95%. However, the impact on the accuracy of predicted functions is low (Fig. 2) and bacterial phenotypes are determined more by the complement of horizontally acquired accessory genes than by minor variations in protein sequences."
I understand the methods behind the mmseq2 and Plass are different.... but mmseq2 should able to handle the 'intron-containing eukaryotic proteins' ...
Anyway,,, could you kindly suggest a good way to identify those euk proteins?? ( the prediction of euk genes from binned euk genomes are so troublesome...)

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

Plass Output (for bugs)

Please make sure to also post the complete output of Plass. You can use gist.github.com for large output.

Context

Providing context helps us come up with a solution and improve our documentation for the future.

Your Environment

Include as many relevant details about the environment you experienced the bug in.

Git commit used (The string after "Plass Version:" when you execute Plass without any parameters):
Which Plass version was used (Statically-compiled, self-compiled, Homebrew, etc.):
For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
Operating system and version:

martin-steinegger · 2020-07-15T13:40:15Z

@liuxianghui Plass extract all open reading frames from short reads and extending them through overlap detection. This works well for proteins that are encoded in an consecutive way. However, eukaryotes have introns so it is not possible to overlap the reads to extract the proteins.
MetaEuk takes assemblies from meta-genomes as input and searches this assemblies six-frame translated against a reference sequence and predicts the proteins from the exons.

What makes the detection of eukaryotic genes hard? The fragmentation of the genomes?

liuxianghui · 2020-07-16T09:55:41Z

For bacteria, the usual approach is to assembly the reads into contigs and then use prodigal to predict the genes. However, this is not OK for euk, we have to do binning of genomes. Find those euk genomes and try the taxonomic assignment. Then use different tools like GeneMark-ES for prediction of gene for each genome. ( There is no tool to work with euk contigs like prodigal for bacteria ). GeneMark-ES use self-training model based on each genome to make prediction. Augustus have limited model and only apply for specify euk genomes.
So I turned to metaeuk and Plass. I expect that they have help me to identify all the euk genes/proteins without going to the metagenomics binning of genomes and running GeneMark-ES. However, I am not sure how well metaeuk and Plass could do.
I saw you did a lot work for marine and gut samples. Please kindly share your opinion on them. MetaEuk is claimed as a sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics. So if my metagenome data does not contain novel euk species, most euk genes could be found by such a search using my assembled contigs, right? Also, PLass seems a denova approach, could it help to identify euk proteins,,,, You Plass nucleotide assembly seems to be not working well as Plass protein asembly. However, I can use mmseq2 to search my contigs against your Plass protein to identify the nucleotide genes. Does this make sense?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Plass for euk metagenomics data #28

Use Plass for euk metagenomics data #28

liuxianghui commented Jul 15, 2020

martin-steinegger commented Jul 15, 2020 •

edited

Loading

liuxianghui commented Jul 16, 2020

Use Plass for euk metagenomics data #28

Use Plass for euk metagenomics data #28

Comments

liuxianghui commented Jul 15, 2020

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Plass Output (for bugs)

Context

Your Environment

martin-steinegger commented Jul 15, 2020 • edited Loading

liuxianghui commented Jul 16, 2020

martin-steinegger commented Jul 15, 2020 •

edited

Loading