MetaPont - A tool to bridge the gap between the output of metagenomic tools and the analysis of the data
- Targeted Functional Analysis: Search for specific functional IDs (e.g., GO terms) within the
_Final_Contig.tsv
files provided by the HuwsLab Metagenome Workflow (https://github.com/TheHuwsLab/Metagenome_Workflow) . - Taxonomic Breakdown: Extract genus-level taxonomy information and calculate their proportions in the dataset.
- Batch Processing: Analyse all
_Final_Contig.tsv
files in a specified directory. - Customisable Output: Save results in a format suitable for downstream analysis.
Ensure you have the following installed:
- Python ~3.6 or later
MetaPont is provided as a pip distribution.
pip install MetaPont
Extract-By-Function -h
usage: Extract_By_Function.py [-h] -d DIRECTORY -f FUNCTION_ID -o OUTPUT
[-m MIN_PROPORTION] [-top TOP_TAXA]
MetaPont v0.0.6: Extract-By-Function - Identify taxa contributing to a
specific function.
options:
-h, --help show this help message and exit
-d DIRECTORY, --directory DIRECTORY
Directory containing TSV files to analyse.
-f FUNCTION_ID, --function_id FUNCTION_ID
Specific function ID to search for (e.g.,
'GO:0016597').
-o OUTPUT, --output OUTPUT
Output file to save results.
-m MIN_PROPORTION, --min_proportion MIN_PROPORTION
Minimum proportion threshold for taxa to be included
in the output.
-top TOP_TAXA, --top_taxa TOP_TAXA
Top n taxa to be included in the output.
The Extract-By-Function
tool provides several command-line options:
Note: Either -m or -top is required.
Option | Description | Required | Default |
---|---|---|---|
-d , --directory |
Directory containing _Final_Contig.tsv files to analyse. |
Yes | None |
-f , --function_id |
Functional ID to search for (e.g., GO:0016597 ). |
Yes | None |
-m , --min_proportion |
Minimum proportion needed for reporting. | Yes/No | None |
-top , --top_taxa |
Number of taxa to report. | Yes/No | None |
-o , --output |
Output file name to save results. | Yes | None |
To search for the functional ID GO:0016597
in all _Final_Contig.tsv
files within the test_data/
directory:
Extract=By-Function -d .../test_data/Final_contig/ -f GO:0016597 -top 3 -o .../test_data/Final_Contig/Extract_By_Function_Out/results.tsv
The tool generates a tab-delimited output file with the following columns:
- Sample: Name of the processed Sample.
- Taxa: Genus-level taxonomic assignment extracted from the
Lineage
column. - Reads Assigned (Function): Number of reads assigned to contigs with the given functional ID.
- Proportion: Proportion of reads assigned to contigs of stated Taxa with the given functional ID within the sample.
- Proportion (Total Reads): Proportion of reads assigned to contigs of stated Taxa with the given functional ID within the total reads of the sample.
Example output:
Function ID: GO:0016597
Sample Taxa Reads Assigned (Function) Proportion (Function) Proportion (Total Reads)
PN0536_0001_S1_Final_Contig.tsv Lactobacillus 111963 0.602 0.004
PN0536_0003_S83_Final_Contig.tsv Lactobacillus 20072 0.457 0.001
PN0536_0002_S2_Final_Contig.tsv Acutalibacter 145222 0.795 0.005
PN0536_0004_S3_Final_Contig.tsv Lactobacillus 40076 0.404 0.002
usage: Extract_Function_By_Taxa.py [-h] -d DIRECTORY -t TAXON -f FUNCTION -o
OUTPUT
MetaPont: Extract Reads Proportions for a Specific Taxon and Function
options:
-h, --help show this help message and exit
-d DIRECTORY, --directory DIRECTORY
Directory containing TSV files to analyse.
-t TAXON, --taxon TAXON
Target taxon to search for (e.g., 'g__Escherichia').
-f FUNCTION, --function FUNCTION
Target function to extract (e.g., 'EC:2.7.11.1').
-o OUTPUT, --output OUTPUT
Output file to save results.
- The script reads
_Final_Contig.tsv
files from the specified directory. - For each file, it searches for occurrences of the given functional ID within specific columns.
- Matches are associated with genus-level taxonomic information extracted from the
Lineage
column. - Taxa proportions are calculated and saved to the output file.
Extract-By-Taxa -h
usage: Extract_By_Taxa.py [-h] -d DIRECTORY -t TAXON -o OUTPUT -func
FUNCTIONAL_CLASSES [-top TOP_FUNCTIONS]
MetaPont: Extract Top Functions by Taxon
options:
-h, --help show this help message and exit
-d DIRECTORY, --directory DIRECTORY
Directory containing TSV files to analyse.
-t TAXON, --taxon TAXON
Target taxon to search for (e.g., 'g__Bacillus').
-o OUTPUT, --output OUTPUT
Output file to save results.
-func FUNCTIONAL_CLASSES, --functional_classes FUNCTIONAL_CLASSES
Which functional classes to report (e.g. GO,EC,KEGG
etc).
-top TOP_FUNCTIONS, --top_functions TOP_FUNCTIONS
Top n functions to include in the output for each
sample (default: 3).
The Extract-By-Taxa
tool provides several command-line options:
Option | Description | Required | Default |
---|---|---|---|
-d , --directory |
Directory containing _Fincal_Contig.tsv files to analyse. |
Yes | None |
-t , --taxon |
Taxa to search for (e.g., g__Bacillus ). |
Yes | None |
-func , --functional_classes |
Functional classes to report (e.g. GO,EC,KEGG etc). | Yes | None |
-top , --top_taxa |
Number of functions to report (default 3). | No | None |
-o , --output |
Output file name to save results. | Yes | None |
To search for the top reported functions for taxon g__Bacillus
in all _Final_Contig.tsv
files within the test_data/
directory:
Extract-By-Taxa -d .../test_data/Final_Contig -t g__Bacillus -o .../test_data/Final_Contig/Extract_By_Taxa/results.tsv -func GO
The tool generates a tab-delimited output file with the following columns:
- Sample: Name of the processed Sample.
- Function: Reported 'top' function.
- Num of Assignments (Functions): Number of times the function has been assigned across all contigs reported as chosen Taxon.
Example output:
Selected Taxon: g__Bacillus
Sample Function Num of Assignments
PN0536_0001_S1 GO:0008150 296
PN0536_0001_S1 GO:0003674 285
PN0536_0001_S1 GO:0005575 254
PN0536_0003_S83 GO:0005575 45
PN0536_0003_S83 GO:0008150 44
PN0536_0003_S83 GO:0003674 43
PN0536_0002_S2 GO:0005575 5
PN0536_0002_S2 GO:0008150 5
PN0536_0002_S2 GO:0005623 4
PN0536_0004_S3 GO:0008150 4
PN0536_0004_S3 GO:0003674 3
PN0536_0004_S3 GO:0005488 3
The script uses csv.field_size_limit
to handle exceptionally large .tsv
files.