-
Notifications
You must be signed in to change notification settings - Fork 6
Home
Welcome to the KEMET wiki!
Starting from previously obtained functional annotation, the MAG/Genome in input is evaluated in terms of KEGG Orthologs (KO) presence. This evaluation is performed in the framework of KEGG Modules, i.e. manually defined functional units composed of KOs, in order to recapitulate (meta)genomic potential.
KEMET output regarding this task is composed of 2 files for each MAG/Genome of interest.
They are located in the KEMET/report_tsv/
and KEMET/report_txt/
folders respectively:
This output is a tabular file with infomations regarding each KEGG Module, indicating the metabolic potential of the MAG/genome defined with the FASTA
name.
Each line includes the tab-separated info as in the following example table:
Module_id | Module_name | Completeness | complete/total blocks | missing KOs | KOs present |
---|---|---|---|---|---|
M00029 | Urea cycle | 1 BLOCK MISSING | 4__5 | K01948,K14681 | K00611,K01940,K01755,K01476 |
-
The Completeness indications are accordingly: "INCOMPLETE", "2 BLOCKS MISSING", "1 BLOCK MISSING", or "COMPLETE".
-
complete/total blocks is indicated with the format "COMPLETE__TOTAL" (with two underscores).
This output is a flat file with indication of KEGG MODULES completeness for every Module, up to the block level. It gives info on which sequential step of the Module path has missing KOs.
KEMET performs bulk nucleotidic sequences download from KEGG GENES using KEGG API. For license terms see this site. API service is available free of charge to academic users only. If users prefer different download options they are encouraged to request a KEGG FTP subscription.
Downloaded GENES sequences are filtered (all unique sequences are considered once for HMM building), aligned using MAFFT multi-sequence aligner and a profile is created using HMMer suite.
The nucleotidic profiles obtained are further searched in the MAG/Genome of interest.
As a default, a threshold value is imposed in order to enrich for complete profiles while not including hits resulting from partial sequences.
Only hits with a score that surpass the threshold are considered proper hits, resulting in the presence of KO(s) of interest in the MAG/Genome sequences.
Information regarding HMM hits is included in the output files:
A tabular file including HMM hits of a single MAG/Genome, defined with the FASTA
name. It contains informations on the hits in the form:
KO | corr_score, e-value | contig_name | strand | genome_left_bound | genome_right_bound | profile_lenght | begin_of_HMMsequence_hit | end_of_HMMsequence_hit |
---|
- corr_score is a metric that describes HMM profile scoring, corrected on the sequence lenght of that profile.
After a single KEMET run, a tabular summary file is generated. It includes every "_HMM_hits" file information and incorporates them in a single table.
Moreover, the file includes further fields:
frame | seq | xseq |
---|
-
frame indicates the most likely translated reading-frame.
-
seq is the nucleotidic sequence as retrieved from the MAG/Genome.
-
xseq is the translated aminoacidic sequence derived from HMM seq using the generic Bacterial/Archaeal translation table (t11).
The script connects missing KOs content, retrieved via HMM hits, to reactions in the BiGG namespace (ModelSEED namespace will be added in a next release).
Based on the --gsmm_mode
parameter it operates in two different ways:
--gsmm_mode denovo
allows an automatic gene-calling from MAG/Genome sequences using Prodigal, and automatically adds the hits retrieved with HMMs to proteins multiFASTA (.faa) files.
After that, KEMET performs a CarveMe reconstruction including these newly found sequences.
NOTE The usage thus described is subject to CarveMe dependences, including the IBM CPLEX Optimizer. More regarding the dependencies can be read about CarveMe installation procedure here.
Using this mode, the newly generated gene prediction and GSMM are included in the KEMET/de_novo_models
folder.
--gsmm_mode existing
allows the identified reactions to be incorporated in esisting genome-scale metabolic models (GSMMs) previously generated with CarveMe, if those are missing.
At the moment (March 2022) the only tested way to add reaction to pre-existing GSMMs is via the ReFramed package. Further improvement would permit adding it through the cobrapy platform.
Informations regarding reaction gapfilling (if performed using the --gsmm_mode existing
parameter) are included in several output files:
A flat-file with the indication of every BiGG reaction that potentially could be added to the model in input, defined with the FASTA
name. The BiGG reactions are included in a one per line format.
A flat-file with the reactions that were actually added for a given MAG/Genome-derived GSMM, defined with the FASTA
name. Reaction names are indicated one per line. followed by the respective reaction string.
Individual GSMMs are saved again after the gapfilling procedure with new reactions and metabolites content as FASTA_KEGGadd_DATE.xml
, where FASTA
follows the input definition and DATE
includes the day of analysis. Files generated this way are stored in the KEMET/model_gapfilled/
folder.