Skip to content

planet_microbe_paper_3

Kai Blumberg edited this page Aug 30, 2021 · 29 revisions

from Peter W

Use tarql for tsv to rdf, and instead of blazegraph use tdb2 and the most recent Fuseki as a front end for data management. It's easy and you can include shiro for security if we want to go public.

benchmarking

data to go into the app:

Prokaryotic WGS only

Amazon continnuum 100 total ( plume 50 samples, river 50 samples)

BATS 63 (smaller files too might not want to use)

CDEBI 20

GOS 70 (very low quality might not want to use)

HOT Chisholm 70 (smaller files too might not want to use)

HOT DeLong 40 (smaller files too might not want to use)

HOT Timeseries 460 samples

OSD 162 (smaller files)

Tara's GO term csv files are ~ 180 KB each. Tara's APY (prokaryote shotgun sequencing) and it only 136 samples. (Tara polar is only virus).

Thus in total we have 1121 prokaryotic fraction samples.

multiply this by the over-estimate of 180 KB for each GO file: which is ~200 MB

The metadata file is on the order of 2kb definitely less then 10.

from Matt B

He setup some scripts for running/loading Blazegraph here if you decide to go that route: https://github.com/hurwitzlab/planet-microbe-scripts/tree/master/blazegraph

And here it is hosted: https://www.planetmicrobe.org/blazegraph/ (console) https://www.planetmicrobe.org/blazegraph/sparql (SPARQL endpoint)

Aim 3

Using GO/NCIB taxon ontologies to "learn more from omics data"

Publication/journal ideas:

Abstract due in June manuscript October. could be a place to put the 3rd pm paper as they deal with

• Physiological responding and metabolism.
• Identification, community structure, and biodiversity.
• Quantification of microbial biomass and productivity.
• Microbial-mediated biogeochemical cycling.
• Biological response and feedback. 

Scalable Bioinformatics: Methods, Software Tools, and Hardware Architectures june/october deadlines. could be cool for paper 1 or 3 maybe?

Competency Questions

CQ1 “Does the new method give comparable results to those published in the literature when re-analyzing published studies analyzed with different methods?”

re-analyze phyla level taxonomy relative abundance from Figure 2 or 8 from (Sunagawa et al. 2015), and possibly the high level functional relative abundance from figure 8, they seem similar to to GO terms could be mapped perhaps.

CQ2 “What is the relationship between environmental variables and organismal distribution.

shortcoming only looking at whats known in cent databasae not like with OTU's where you can get at unknown. Same with GO only gets at the known make question about Thaumarchaea give Thaumarchaea as example known aquatic ammonia oxidizer expect to see effects of nitrate, nitrite, and ammonium but testing everying on their distribution.

rcca

CQ3 “What oxidoreductase complexes differentiate various aquatic layer environments?”

comparing functional potential of samples from different ecosystems.

Although not as commonly discussed in microbial ecology, macroecology studies have refer to potential for studying functional beta diversity (Swenson et al. 2012). One study, directly explored the concept of functional beta diversity to compare the carbon degredation capcity of various freshwater lakes lake bacterioplankton (Dickerson and Williams 2014). Similarly, the addition of hydorolytic enzymes has long been used to test and compare the functional metabolism of particular hydrolytic activities (Burns and Dick 2002; Boetius 1995).

add this it only does what you can annotate so it's not true diversity only known functional diversity.

Energy metabolism, is one of the most crutial functional metabolic processes in which living cells engage. ... REDOX metabolism. ... . This new method can help expedite this process. Compare across avaible dasets made interoperable across dataserts annotated with ENVO terms (search envo heirearch).

architechure here would allow us to more easily answer these questions and Here's another method to investiage these

New: insights from using multiple datasets, questions about specific metabolism. ... maybe qualities too?

Method PCoA is a common ordination technique for beta diversity in microbial ecology (Knight et al. 2018). This new method (USING GO) enables us to study the functional beta-diversity by comparing the relative abundances of GO-specified gene families from various ecosystems.

see mixomics for rcca

Papers and ideas

This paper is a gold mine for AOA

from wiki: AOA dominate in both soils and marine environments,[2][6][7] suggesting that Thaumarchaeota may be greater contributors to ammonia oxidation in these environments (this paper).

Positive correlations between the abundance of Crenarchaeota and nitrite were observed in the Arabian Sea (11) and the Santa Barbara Channel time series (12) and with particulate organic nitrogen in Arctic waters (13)

Here we report oligotrophic ammonia oxidation kinetics and cellular characteristics of the mesophilic crenarchaeon ‘Candidatus Nitrosopumilus maritimus’ strain SCM1. Unlike characterized ammonia-oxidizing bacteria, SCM1 is adapted to life under extreme nutrient limitation, sustaining high specific oxidation rates at ammonium concentrations found in open oceans.

Another common analysis approach is to look at differentially abundant microorganisms or functional elements (for example, genes and pathways) in the comparison groups of interest (that is, treatment versus control). Identifying microbial taxa that explain differences between communities is particularly challenging because microbiome data sets are high-dimensional (that is, they include thousands of taxa), sparse and compositional.

Perhaps Cite this and talk about how this can automate the identification of microbial taxa that explain differences between environments.

Also has:

For visualizing beta diversity data, ordination techniques, such as principal coordinates analysis (PCoA) or principal component analysis (PCA), are commonly used.

use for CQ2

blog post: on diversity and diversity indices The new synthesis of diversity indices and similarity measures

Analyses of functional beta diversity have also become more common with a large sum of work focusing on the development of functional beta diversity metrics that are often implemented in relatively species-poor temperate systems (e.g., Ricotta and Burrascano 2009), with only one study, to our knowledge, being conducted in a highly diverse tropical system (Swenson et al. 2011).

Can reuse this idea for ordination method reduce dimentinaly for large sparse datasets.

Study examinig the functional betadiversity of lake Bacterioplankton. The used DGGE, but get at the question of functionla betadiveristy for bacteria.

Overall, Biolog analysis was useful in identifying differences in the functional diversity of bacterial communities between lakes of different trophic statuses and can be used as a tool to assess ecosystem health.

OLD

Ideas for potential questions to try and ask:

GO

polysaccharides

from this MPI mol-ecol paper they talk about various polysaccharides: laminarin, xylan, chondroitin sulfate, arabinogalactan, and carrageenan.

Go has a cellular polysaccharide catabolic process hierarchy, with some depth/breath, could try using this. or xylan metabolic process or polysaccharide catabolic process //try this

Auxotrophy

cool idea but probably can't get this info from GO.

signal transduction

signal transduction many subclasses each not super deep has osmosensory signaling pathway but not super deep.

temperature dependence

Not looking super promising.

metabolism

try inorganic anion transport

lots of redox complexes could be cool to try and answer redox metabolic questions.

NCBI Taxon

From the Tara structure and function paper we can drill down much further into the taxonomic structure.

Also refer against Global patterns of bacterial beta-diversity in seafloor and seawater ecosystems figure 2 comparing the high-level phylogenetic differentation between benthic and pelagic at the coast surface deep etc.

Reffing to the two papers above we could ask questions such as "What Alphaproteobacteria differentiate sea surfaces from deep chlorophyll maxima?" or "What Gamaproteobacteria differentiate deep chlorophyll maxima from mesopelagic samples? or "What is the effect of depth on the distribution of deltaproteobacteria community structure? or "What is the effect of temperature on Cyanobacteria community structure? or "What are the effects of nitrate, nitrite and ammonium concentrations on the known aquatic ammonia oxidizer Thaumarchaea?

Other

from Comprehensive Meta-analysis of Ontology Annotated 16S rRNA Profiles Identifies Beta Diversity Clusters of Environmental Bacterial Communities have a section Salinity as the major driving factor for community assembly? could be cool to recap some of the hypotheses here for AIM3 and approach the issue they way they do in this EMP paper and drill down into questions about what features affect community (such as proteobacteria) composition

This perspective paper from DeLong and Karl in 2005

has a section about A genomic glimpse into coastal bacterial lifestyles ask some questions like: Do specific biological properties of coastal bacterioplankton differentiate them from their open-ocean relatives? Can the genomic and physiological properties of bacterioplankton explain in part their observed distribution? Can these different biological features tell us about potential regional differences in the microbial cycling of matter and energy? ... has some examples in the paragraph last one about transporters for uptake of amino acids ammonium urea etc. Could be really cool to try and answer some of these questions.

goes on about questions of ecotypes example about different Prochlorococcus strains isolated from different depths. Would probably need to trace back genes associcated with reads that map to the taxa to be able to answer this. That's why it would be cool if we could keep track of both but maybe too much to ask.

Check out Comparative Metagenomics of Microbial Communities maybe can get some ideas from this? or re-do some analyses or cite for this section?

2020 recent paper check out section 3.5 Investigation of the relationship between bacterial alpha diversity and environmental/spatial factors table 2 is gives Pearson's correlation coefficients (r) between alpha diversity of bacterial community and biogeochemical characteristics of seawater samples could be a cool thing to recreate with planet microbe.

A bit interesting too in that it discusses (see table 1) the effects of (amongst other things) heterotrophic prokaryotes; Fluo, In situ fluorescence; Proc, Prochlorococcus; Syn, Synechococcus; Temp, temperature; Density, potential density. could be cool to do add this stuff to the Prochlorococcus/Synechococcus story. The effects of xyz environmental vars on Prochlorococcus/Synechococcus distributions. In NCBI taxon there are more named Synechococcus and levels of hierarchy then for Prochlorococcus.

Older 2006 HOT paper but start to try and ask some questions see fig 2 taxonomic distributions of various microbial groups at depth. could follow this up with q's like what Alphas are differentiated by depth.

Ecological multivariate statistics:

Pier cites this in him phd thesis (might be his mentor?) can use to cite the idea of multivariate stats being used in micb ecology

Check out fig 6 does Correspondence analyses of the microbial diversity and environmental variables does high level taxonomic groups against temp, chl sal no2 no3 etc. Def cite and "redo this at a larger scale.

Fig 4 does CCA for eurkaryotes against environmental variables Nitrate Nitrite Phosphate Silicate Temperature and Salinity.

Fig 5 Also does Canonical correspondence analysis (CCA) of bacterial communities associated with environmental variables: oxygen depth ph doc nh4 no2

2017 study, Figure 5 CCA ordination plot depicting the relationship between environmental parameters and bacterial community structure, as represented by 16S rRNA gene sequence data.

Also has CCA fig 4 of bacterial communites against several nutrients. deals with bloom events.

2017 Also do similar thing NMDS based on Bray-Curtis community distances. Arrows show vector fitting of the environmental variables. Also try to correlations of pro/synnocococus vs latitude. Also give overview of abundant taxa could go along with the Global patterns of bacterial beta-diversity in seafloor and seawater ecosystems paper to desribe overall taxonomic structure globall and in different regions, compare tax structure of coastal regions vs open seas could link this to different envo terms

uses Hot and bats + other sites tries to do funcional analysis with COGS but it's pretty messy still could maybe be some background functional analysis

has section on Motility where talk about how Motility might enable bacteria to achieve spatial coupling with a DOM source. Could maybe try filtering for DOM and looking at it as a gradient and searching some GO motility related genes. Might not work but could try it. Definitely read this during Marmic courses.

Stal, Lucas J., and Mariana Silvia Cretoiu. 2016. The Marine Microbiome: An Untapped Source of Biodiversity and Biotechnological Potential. Springer.

//Could TAKE something from this for the intro: The marine microbiome is not just interesting from a scientific point of view. Certainly, with 70 % of the Earth’s surface covered by the ocean and the ocean probably being the largest continuous habitat, the marine microbiome plays a prominent role in the biogeochemical cycling of elements, is at the basis of the marine foodweb, critical for the ecology of the sea, and essential for climate reg- ulation and counteracting the effects of global change.

Arrigo, Kevin R. 2005. “Marine Microorganisms and Global Nutrient Cycles.” Nature 437 (7057): 349–55.

//It would be cool to cite something from this for the intro maybe: **On a global scale, cycling of nutrients also affects the concentration of atmospheric carbon dioxide. Because of their capacity for rapid growth, marine microorganisms are a major component of global nutrient cycles. Understanding what controls their distributions and their diverse suite of nutrient transformations is a major challenge facing contemporary biological oceanographers. ** (Arrigo 2005) also talks about and cites info on the long standing question of the redfield ratio, microbial metabolism affecting it cites marcel krypers :(. Could cite if I want to try and dig in on the redfield ratio question.

https://www.nature.com/articles/nrmicro1762 and https://www.nature.com/articles/nrmicro1749

Papers on Bacterial traits

IJSEM db https://figshare.com/articles/International_Journal_of_Systematic_and_Evolutionary_Microbiology_IJSEM_phenotypic_database/4272392 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5541158/

A synthesis of bacterial and archaeal phenotypic trait data https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7275036/

BacDive https://academic.oup.com/nar/article/47/D1/D631/5106998

Traitar https://msystems.asm.org/content/1/6/e00101-16

Clone this wiki locally