Skip to content

Metaproteogenomics - a new level of omics analyses

Compare
Choose a tag to compare
@iquasere iquasere released this 16 Jan 22:51
· 82 commits to master since this release

New workflow of metaproteomics analyses, based on metagenomics (MG) results.

This new layer of analysis allows to input spectra - both in raw and standard formats - to MOSCA for metaproteomics (MP) analysis

MOSCA's MP workflow is as follows:

1. Database construction

A database is built from MG results, aiming to include all sequences that can possibly be in the datasets. This include:

  • the genes identified by FragGeneScan on the MG gene calling step
  • reference proteomes retrieved from UniProt of the taxa identified in the annotation step with UPIMAPI
  • the cRAP database
  • the protease sequence - only automatically available sequence is Trypsin for now, all others must be inputted manually

This database will then be submitted for a first round of Peptide-to-Spectrum matching with SearchCLI and PeptideShaker. All proteins with at least one Peptide-to-Spectrum match (PSM) are collected for the final database - the metaproteogenomics database.

2. Peptide-to-Spectrum matching

SearchCLI is used for obtaining PSMs from inputted spectra, using as reference the database constructed in the previous step. SearchCLI is used with three search engines - X!Tandem, MyriMatch and MS-GF+. More engines might be added in the future.

3. Protein inference

PeptideShaker is used for protein inference and quantification, based on spectracounts. PSMs are selected at a 5 % local False Discovery Rate, and only peptides with two or more PSMs and only proteins with two or more peptides identified are selected for further analysis

4. Normalization, imputation and differential protein expression analysis

Spectracounts are normalized with Variance Stabilizing Normalization. Missing values are imputed using Local Least Squares Imputation.

Normalized and imputed spectracounts are then submitted for differential protein expression analysis with Reproducibility-Optimized Test Statistics. Log2foldchange and p-values are retrieved for reporting.

5. Metabolic pathway representation and final reportings

All following steps are performed as close as possible to metatranscriptomics (gene expression) analysis.

Metabolic maps are built with KEGGCharter, showing protein expression levels from MP and genomic potential from MG.

Final reports include all results from MG, and report on differential expression analysis of proteins.

Other updates

MOSCA has increased its workflow in around 40 %.

MOSCA is now compatible with the six months old updates of UniProt, through UPIMAPI. It includes the parsing of taxonomic columns, to continue representing taxonomic kronas.

Snakemake conda environments are now used, instead of one single environment. This has made possible again to build MOSCA's environments, and may signal the return of MOSCA to Bioconda.