Skip to content

Latest commit

 

History

History
71 lines (63 loc) · 6.65 KB

File metadata and controls

71 lines (63 loc) · 6.65 KB

Code to reproduce the analysis in the AMPSphere manuscript

This collection of Jupyter notebooks bring the code needed to reproduce the research in the bioinformatics front from the paper describing the AMPSphere resource. They are separated by subject and mostly bring a couple of figures and analysis each:

Figure Jupyter notebook Description
1A 01_Exploring_metadata Map with samples in different colors according the habitat
1B 03_quality_and_homologs_distribution Bar plot and pie chart of AMP quality and mapping per database
1C 04_c_AMP_overlaps Rarefaction curves by environment
1D 02_Graphics_for_habitat_overlap Sankey plots of AMP overlap by habitats
2A 10_gmgc_homologs_analysis Histogram of counts of AMPs versus start of match to full-length proteins as % of the target length
2B Manual curation Alignment of the gene NAD(P)-dependent dehydrogenase that originates the AMP10.271_016 in different Prevotella genomes
2CA 10_gmgc_homologs_analysis Annotation of AMPs using eggnog mapper – Bar plot showing the functions per COG class
2CB 10_gmgc_homologs_analysis Annotation of AMPs using eggnog mapper – Box plot showing the enrichment in relation to GMGC
3A 11_genome_context Bar plot showing the functions of the more frequent conserved gene neighborhoods involving the AMPs
3B 11_genome_context Bar plot showing the functions involved in the antibiotic resistance in the more frequent conserved gene neighborhoods involving the AMPs
3C 11_genome_context Bar plot showing the functions involving antibiotic synthesis in the more frequent conserved gene neighborhoods involving the AMPs
3D 14_calculate_densities Example for the gene neighborhood of AMP10.015_426
4A 12_clonal_and_accessory_c_AMPs Bar plot showing the proportion of core, shell and accessory AMPs and families with and without high-quality filtering
4B 13_most_represented_taxa Bar plot showing the taxonomic annotation of AMPSphere
4C 16_density_across_taxonomies Box plot showing the AMP density per genus of different phyla
4D 16_density_across_taxonomies Phylogenetic tree of gene found in AMPSphere and showing the AMP density, its associated error and median
5A 01_Exploring_metadata Box plot of AMP density per host vs. non-host associated samples
5B 15_density_across_environments Box plot of AMP density per sample referring to Prevotella copri in different habitats
5C 15_density_across_environments Box plot of average AMP density per species from human oral cavity and guts
5D 15_density_across_environments Box plot of average AMP density per species from soil and plant-associated
S1A 07_c_AMP_features_comparison Distribution of peptide length
S1B 07_c_AMP_features_comparison Distribution of small lateral chain residues (ABCDGNPSTV) in percent
S1C 07_c_AMP_features_comparison Distribution of basic lateral chain residues (HRK) in percent
S1D 07_c_AMP_features_comparison Distribution of pI of peptides
S1E 07_c_AMP_features_comparison Distribution of charge of peptides at pH 7.0
S1F 07_c_AMP_features_comparison Distribution of aliphatic index of peptides
S1G 07_c_AMP_features_comparison Distribution of of instability index of peptides
S1H 07_c_AMP_features_comparison Distribution of Boman index of peptides
S1I 07_c_AMP_features_comparison Distribution of hydrophobic moment of peptides
S2A 03_quality_and_homologs_distribution Bar plot with different peptide quality tests in the y axis and the proportion of AMPSphere in x
S2B 03_quality_and_homologs_distribution Homologs quality by database used in the annotation and the quality test
S2C 04_c_AMP_overlaps Heatmap of the overlap of AMPs across low-level and high-level environments
S2D 05_cAMPs_rarity Line plot of the number of detections versus number of AMPs (genes)
S3 06_c_AMPs_clustering_validation Scatter plots of the identity of the hit against the cluster representative versus the corresponding e-value at the different clustering levels identity cutoffs
S4 01_Exploring_metadata Box plot of AMP density across different high-level habitats
S5A 20_density_across_taxonomies_controlling_quality Box plot of AMP density per genus from different phyla using only quality-controlled AMPs
S5B 21_density_for_host_samples_environments_quality_controlled Box plot of AMP density per sample from different low-level habitats using only quality-controlled AMPs
S5C 21_density_for_host_samples_environments_quality_controlled Box plot of AMP density per sample from host- and non-host-associated environments using only quality-controlled AMPs
S5D 19_density_across_environments_controlling_quality Box plot of AMP density per sample referring to Prevotella copri using only quality-controlled AMPs
S5E 19_density_across_environments_controlling_quality Box plot of average AMP density per species from human oral cavity and guts with only quality-controlled AMPs
S5F 19_density_across_environments_controlling_quality Box plot of average AMP density per species from soil and plant-associated with only quality-controlled AMPs

It also brings the Supplementary Tables in the manuscript:

Tables Jupyter notebook Description
S1 01_Exploring_metadata Metadata description associated to the metaproteomes used in this study
S2 09_supplementary_info_about_habitats_and_samples Summarizing statistics about the AMPSphere
S3 10_gmgc_homologs_analysis Orthologs group enrichment in the AMPs that are GMGC homologs
S4 Annotation result KEGG pathway annotation of AMPs with conserved genome contexts
S5 11_genome_context Statistical test of the number of full-length protein families with conserved genome contexts
S6 Annotation result KEGG orthologs to the AMPs with conserved genome contexts
S7 16_density_across_taxonomies AMP density per genus
S8 14_calculate_densities AMP density per species per habitat only in species happening in at least 10 samples per habitat
S9 Manual curation Metadata description associated to the metaproteomes used in this study

To open the jupyter notebooks, you will need to type:

  $ jupyter notebook