diff --git a/RaMP_Vignette.Rmd b/RaMP_Vignette.Rmd index 91b39a20..9606c2ca 100644 --- a/RaMP_Vignette.Rmd +++ b/RaMP_Vignette.Rmd @@ -1,5 +1,5 @@ --- -title: "RaMP-DB 2.0 Vignette" +title: "RaMP-DB 2.x Vignette" author: "Kyle Spencer, Ewy Mathé, Andrew Patt, and Cole Tindall" date: "`r Sys.Date()`" output: diff --git a/RaMP_v3.0_SQLite_Vignette.Rmd b/RaMP_v3.0_SQLite_Vignette.Rmd new file mode 100644 index 00000000..553fa388 --- /dev/null +++ b/RaMP_v3.0_SQLite_Vignette.Rmd @@ -0,0 +1,291 @@ +--- +title: "RaMP-DB 3.x Vignette" +author: "Kyle Spencer, Ewy Mathé, Andrew Patt, and John Braisted" +date: "`r Sys.Date()`" +output: + html_document: + self_contained: yes + highlight: kate + theme: yeti + toc: yes + fig_width: 9 + fig_height: 7 + code_folding: show +vignette: > + %\VignetteIndexEntry{Running RaMP locally} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- +## Introduction + +This vignette will provide basic steps for interacting with [RaMP-DB](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5876005/) (Relational database of Metabolomic Pathways). +The codebase for RaMP-DB is available on [our GitHub site, sqlite branch](https://github.com/ncats/RaMP-DB/tree/sqlite). Details on RaMP-DB installation are also avaialble through GitHub, and questions can be asked through the Issues tab or by sending an email to [NCATSRaMP@nih.gov](mailto:NCATSRaMP@nih.gov). + +RaMP-DB supports queries and enrichment analyses. Supported queries are: + +- Retrieve Analytes From Input Pathway(s) +- Retrieve Pathways From Input Analyte(s) +- Retrieve Metabolites from Metabolite Ontologies +- Retrieve Ontologies from Input Metabolites +- Retrieve Analytes Involved in the Same Reaction +- Retrieve Chemical Classes from Input Metabolites +- Retrieve Chemical Properties from Input Metabolites + +Supported enrichment analyses are: + +- Perform Pathway Enrichment +- Perform Chemical Enrichment + +Once installed, first load the package. The first call is to list available database version within your local file cache and in our remote repository. +Initialize RaMP database object. This method will reference a RaMP DB version in local file cache for your current session, or will download the latest version of the RaMP database. +Note that this RaMP() method can accept a version argument with a format like, version='2.3.2', for instance. The supplied version should be one of the versions shown after listing available versions. +```{r message=F, suppressWarnings=T} +library(RaMP) +library(DT) # for prettier tables in vignette +library(dplyr) +library(magrittr) + +RaMP::listAvailableRaMPDbVersions() + +# load a local RaMP database or download the latest RaMP database version from the repository. +rampDB <- RaMP() +``` + +## Supported RaMP Queries + +### Retrieve Analytes From Input Pathway(s) +Analytes (genes, proteins, metabolites) can be retrieve by pathway. Users have to input the exact pathway name. Here is an example: +```{r} +myanalytes <- getAnalyteFromPathway(db = rampDB, pathway="Sphingolipid metabolism") +``` + +```{r echo = FALSE} +cutoff <- 100 + +for(i in 1:nrow(myanalytes)) { + char.length <- nchar(myanalytes$sourceAnalyteIDs[i]) + if(char.length >= cutoff) { + new.string <- substr(myanalytes$sourceAnalyteIDs[i], 1, cutoff) + new.string <- paste0(new.string, "...") + myanalytes$sourceAnalyteIDs[i] <- new.string + } +} +``` + +```{r} +datatable(myanalytes) +``` + +To retrieve information from multiple pathways, input a vector of pathway names: +```{r} +myanalytes <- getAnalyteFromPathway(db = rampDB, pathway=c("De Novo Triacylglycerol Biosynthesis", + "sphingolipid metabolism")) +``` + +### Retrieve Pathways From Input Analyte(s) +It is oftentimes useful to get a sense of what pathways are represented in a dataset (this is particularly true for metabolomics, where coverage of metabolites varies depending on what platform is used). In other cases, one may be interested in exploring one or several metabolites to see what pathways they are arepresented in. + +Note that it is always preferable to utilize IDs rather then common names. +When entering IDs, prepend each ID with the database of origin followed by a colon, for example kegg:C02712, hmdb:HMDB04824, etc.. It is possible to input IDs using multiple different sources. RaMP currently supports the following ID types (that should be prepended): + +```{r} + metabprefixes <- getPrefixesFromAnalytes(db = rampDB, "metabolite") + geneprefixes <- getPrefixesFromAnalytes(db = rampDB, "gene") + + datatable(rbind(metabprefixes, geneprefixes)) +``` + +In this example, we will search for pathways that involve the two genes MDM2 and TP53, and the two metabolites glutamate and creatinine. + +```{r} +pathwaydfids <- getPathwayFromAnalyte(db = rampDB, c("ensembl:ENSG00000135679", "hmdb:HMDB0000064", + "hmdb:HMDB0000148", "ensembl:ENSG00000141510")) +datatable(pathwaydfids) +``` + +Note that each row returns a pathway attributed to one of the input analytes. +To retrieve the number of unique pathways returned for all analytes or each analyte, try the following: + +```{r} +print(paste("Number of Unique Pathways Returned for All Analytes:", + length(unique(pathwaydfids$pathwayId)))) +lapply(unique(pathwaydfids$commonName), function(x) { + (paste("Number of Unique Pathways Returned for",x,":", + length(unique(pathwaydfids[which(pathwaydfids$commonName==x),]$pathwayId))))}) +``` + +### Retrieve Metabolites from Metabolite Ontologies +Conversely, the user can retrieve the metabolites that are associated with a specific ontology or vector of ontologies. We can accomplish this using the function getMetaFromOnto(). It should be noted that it does not matter which ontology the metabolites are from. The function will return all metabolites associated with all the ontologies specified by the user. +```{r} +ontologies.of.interest <- c("Colon", "Liver", "Lung") + +new.metabolites <- RaMP::getMetaFromOnto(db = rampDB, ontology = ontologies.of.interest) + +datatable(head(new.metabolites, n=10)) +``` + +### Retrieve Ontologies from Input Metabolites +RaMP contains information on where the metabolites originate from the biospecimen. This information is called ontology. Here are all the ontologies found in RaMP. + +To retrieve ontologies that are associated with our metabolites we can use getOntoFromMeta(). This function takes in a vector of metabolites as an input and returns a vector comprised of the ontologies from the user's defined metabolites. +```{r} +analytes.of.interest <- c("ensembl:ENSG00000135679", "hmdb:HMDB0000064", + "hmdb:HMDB0000148", "ensembl:ENSG00000141510") +new.ontologies <- RaMP::getOntoFromMeta(db = rampDB, analytes = analytes.of.interest) +datatable(new.ontologies) +``` + +### Retrieve Analytes Involved in the Same Reaction +The user may want to know what gene transcripts encode enzymes which can catalyze reactions involving metabolites in their experiment. RaMP can return this data to its user. + +We can return the gene transcripts using the rampFastCata() function. To use it the user needs to provide a vector of metabolites they are interested in and the connection information for MySQL. The user can also input a vector of protein IDs or gene transcripts to return the metabolites involved in chemical reactions with the input proteins or gene transcript encoded proteins. +```{r message=F} +#Input Metabolites +analytes.of.interest <- c("ensembl:ENSG00000135679", "hmdb:HMDB0000064", + "hmdb:HMDB0000148", "ensembl:ENSG00000141510") + +new.transcripts <- rampFastCata(db = rampDB, analytes = analytes.of.interest) + +datatable(head(new.transcripts, n=10)) + +#Input Proteins +proteins.of.interest <- c("uniprot:094808", "uniprot:Q99259") + +new.metabolites <- rampFastCata(db = rampDB, analytes = proteins.of.interest) + +datatable(head(new.metabolites, n=10)) +``` + +RaMP has a built in function which is able to generate networks from the transcript data. This function is named plotCataNetwork(). This function uses the dataframe created by rampFastCata() as an input. These plots are completely interactive. +```{r} + +plotCataNetwork(head(new.transcripts, n=100)) +``` + +### Retrieve Chemical Classes from Input Metabolites +RaMP incorporates Classfire and lipidMAPS classes. The function chemicalClassSurvey() function takes as input a vector of metabolites and outputs the classes associated with each metabolite input. + +```{r} +metabolites.of.interest = c('hmdb:HMDB0000056','hmdb:HMDB0000439','hmdb:HMDB0000479','hmdb:HMDB0000532', + 'hmdb:HMDB0001015','hmdb:HMDB0001138','hmdb:HMDB0029159','hmdb:HMDB0029412', + 'hmdb:HMDB0034365','hmdb:HMDB0035227','hmdb:HMDB0007973','hmdb:HMDB0008057', + 'hmdb:HMDB0011211') +chemical.classes <- chemicalClassSurvey(db = rampDB, mets = metabolites.of.interest) +metabolite.classes <- as.data.frame(chemical.classes$met_classes) +datatable(metabolite.classes) +``` + +### Retrieve Chemical Property Information from Input Metabolites +Chemical properties captured by RaMP include SMILES, InChI, InChI-keys, monoisotopic masses, molecular formula, and common name. The getChemicalProperties() function takes as input a vector of metabolites and outputs a list of chemical property information that can easily be converted into a dataframe. + +```{r} +chemical.properties <- getChemicalProperties(db = rampDB, metabolites.of.interest) +chemical.data <- chemical.properties$chem_props +datatable(chemical.data) +``` + +## Enrichment Analyses +RaMP performs pathway and chemical class overrespresentation analysis using Fisher's tests. + +### Perform Pathway Enrichment +Using the pathways that our analytes map to, captured in the pathwaydfids data frame in the previous step, we can now run Fisher's Exact test to identify pathways that are enriched for our analytes of interest: + +```{r, results='hide'} +fisher.results <- runCombinedFisherTest(db = rampDB, analytes = c( + "hmdb:HMDB0000033", + "hmdb:HMDB0000052", + "hmdb:HMDB0000094", + "hmdb:HMDB0000161", + "hmdb:HMDB0000168", + "hmdb:HMDB0000191", + "hmdb:HMDB0000201", + "chemspider:10026", + "hmdb:HMDB0006059", + "Chemspider:6405", + "CAS:5657-19-2", + "hmdb:HMDB0002511", + "chemspider:20171375", + "CAS:133-32-4", + "CAS:5746-90-7", + "CAS:477251-67-5", + "hmdb:HMDB0000695", + "chebi:15934", + "CAS:838-07-3", + "hmdb:HMDBP00789", + "hmdb:HMDBP00283", + "hmdb:HMDBP00284", + "hmdb:HMDBP00850" +)) + +``` + +*Note*: To explicitly view the results of mapping input IDs to RaMP, users can run the getPathwayFromAnalyte() function as noted in above in the section "Retrieve Pathways From Input Analyte(s)". + +Once we have our fisher results we can format them into a new dataframe and filter the pathways for significance. +For this example we will be using an FDR p-value cutoff of 0.05. +```{r} +#Returning Fisher Pathways and P-Values +filtered.fisher.results <- FilterFishersResults(fisher.results, pval_type = 'holm', pval_cutoff=0.05) +``` + +Because RaMP combines pathways from multiple sources, pathways may be represented more than once. Further, due to the hierarchical nature of pathways and because Fisher's testing assumes pathways are independent, subpathways and their parent pathways may appear in a list. +To help group together pathways that represent similar biological processes, we have implemented a clustering algorithm that groups pathways together if they share analytes in common. + +```{r} +clusters <- RaMP::findCluster(db = rampDB, filtered.fisher.results, + perc_analyte_overlap = 0.2, + min_pathway_tocluster = 2, perc_pathway_overlap = 0.2 +) + +## print("Pathways with Holm-adjusted Pval < 0.05") + +datatable(clusters$fishresults %>% mutate_if(is.numeric, ~ round(., 8)), + rownames = FALSE +) + +``` + +To view clustered pathway results: + +```{r, fig.height = 8} +pathwayResultsPlot(db = rampDB, filtered.fisher.results, text_size = 8, perc_analyte_overlap = 0.2, + min_pathway_tocluster = 2, perc_pathway_overlap = 0.2, interactive = FALSE) +``` + +### Perform Chemical Enrichment +After retrieving chemical classes of metabolites, the function chemicalClassEnrichment() function will perform overrepresentation analysis using a Fisher's test and output classes that show enrichment in the user input list of metabolites relative to the backgroud metabolite population (all meteabolites in RaMP). The function performs enrichment analysis for Classyfire classes, sub-classess, and super-classes, and for LipidMaps categories, main classess, and sub classes. + +```{r message=F} +metabolites.of.interest = c('hmdb:HMDB0000056','hmdb:HMDB0000439','hmdb:HMDB0000479','hmdb:HMDB0000532', + 'hmdb:HMDB0001015','hmdb:HMDB0001138','hmdb:HMDB0029159','hmdb:HMDB0029412', + 'hmdb:HMDB0034365','hmdb:HMDB0035227','hmdb:HMDB0007973','hmdb:HMDB0008057', + 'hmdb:HMDB0011211') +chemical.enrichment <- chemicalClassEnrichment(db = rampDB, mets = metabolites.of.interest) + +# Enrichment was performed on the following chemical classes: +names(chemical.enrichment) + +# To retrieve results for the ClassyFire Class: +classy_fire_classes <- chemical.enrichment$ClassyFire_class +datatable(classy_fire_classes) +``` + +*Note*: To explicitly view the results of mapping input IDs to RaMP, users can run the chemicalClassSurvey() function as noted in above in the section "Retrieve Chemical Class from Input Metabolites". + +This code section demonstrates a Rhea reaction query. +```{r} + +analytes.of.interest <- c('chebi:57368', 'uniprot:Q96N66', 'CHEBI:73003') + +reactionsLists <- RaMP::getReactionsForAnalytes(db = rampDB, analytes = analytes.of.interest, includeTransportRxns = F, humanProtein = T) + +# just show the reactions with at least one metabolite and one protein in commmon. +datatable(subset(reactionsLists$metProteinCommonReactions, select = -c(rxn_html_label))) + +``` +Three reaction lists are returned, metabolites-to-reactions, proteins-to-reactions, and reactions that have at least one metaboite and one protein from the input analyte list. + + +```{r} +sessionInfo() +```