diff --git a/RaMP_v3.0_SQLite_Vignette.Rmd b/RaMP_v3.0_SQLite_Vignette.Rmd index 54cb29b0..894b71d8 100644 --- a/RaMP_v3.0_SQLite_Vignette.Rmd +++ b/RaMP_v3.0_SQLite_Vignette.Rmd @@ -48,6 +48,8 @@ library(magrittr) RaMP::listAvailableRaMPDbVersions() # load a local RaMP database or download the latest RaMP database version from the repository. +# If the version is not specified, the latest local version will be used. +# If there are not local database cached, then the latest remote version will be downloaded. rampDB <- RaMP() ``` diff --git a/RaMP_v3.0_SQLite_Vignette.html b/RaMP_v3.0_SQLite_Vignette.html new file mode 100644 index 00000000..89f99577 --- /dev/null +++ b/RaMP_v3.0_SQLite_Vignette.html @@ -0,0 +1,7916 @@ + + + + + + + + + + + + + + + +RaMP-DB 3.x Vignette + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + +
+ +
+ +
+

Introduction

+

This vignette will provide basic steps for interacting with RaMP-DB +(Relational database of Metabolomic Pathways).
+The codebase for RaMP-DB is available on our GitHub site, +sqlite branch. Details on RaMP-DB installation are also avaialble +through GitHub, and questions can be asked through the Issues tab or by +sending an email to NCATSRaMP@nih.gov.

+

RaMP-DB supports queries and enrichment analyses. Supported queries +are:

+ +

Supported enrichment analyses are:

+ +

Once installed, first load the package. The first call is to list +available database version within your local file cache and in our +remote repository. Initialize RaMP database object. This method will +reference a RaMP DB version in local file cache for your current +session, or will download the latest version of the RaMP database. Note +that this RaMP() method can accept a version argument with a format +like, version=‘2.3.2’, for instance. The supplied version should be one +of the versions shown after listing available versions.

+
library(RaMP)
+library(DT) # for prettier tables in vignette
+library(dplyr)
+library(magrittr)
+
+RaMP::listAvailableRaMPDbVersions()
+
## [1] "Locally available versions of RaMP SQLite DB, currently on your computer:"
+## [1] "No local versions of the RaMP Database were found."
+## [1] "Please use the command 'db <- RaMP()' to download the latest version into local file cache."
+## [1] "Alternatively you can use the command db <- RaMP(version = <remote_version_number>) using one of the versions listed below."
+## [1] "Available remote RaMP SQLite DB versions for download:"
+## [1] "2.3.2" "2.3.1"
+
# load a local RaMP database or download the latest RaMP database version from the repository.
+# If the version is not specified, the latest local version will be used.
+# If there are not local database cached, then the latest remote version will be downloaded.
+rampDB <- RaMP()
+
+
+

Supported RaMP Queries

+
+

Retrieve Analytes From Input Pathway(s)

+

Analytes (genes, proteins, metabolites) can be retrieve by pathway. +Users have to input the exact pathway name. Here is an example:

+
myanalytes <- getAnalyteFromPathway(db = rampDB, pathway="Sphingolipid metabolism")
+
## [1] "fired!"
+## [1] "Timing .."
+##    user  system elapsed 
+##    0.20    0.06    1.47
+
datatable(myanalytes)
+
+ +

To retrieve information from multiple pathways, input a vector of +pathway names:

+
myanalytes <- getAnalyteFromPathway(db = rampDB, pathway=c("De Novo Triacylglycerol Biosynthesis", 
+                                              "sphingolipid metabolism"))
+
## [1] "fired!"
+## [1] "Timing .."
+##    user  system elapsed 
+##    0.17    0.15    0.93
+
+
+

Retrieve Pathways From Input Analyte(s)

+

It is oftentimes useful to get a sense of what pathways are +represented in a dataset (this is particularly true for metabolomics, +where coverage of metabolites varies depending on what platform is +used). In other cases, one may be interested in exploring one or several +metabolites to see what pathways they are arepresented in.

+

Note that it is always preferable to utilize IDs rather then common +names. When entering IDs, prepend each ID with the database of origin +followed by a colon, for example kegg:C02712, hmdb:HMDB04824, etc.. It +is possible to input IDs using multiple different sources. RaMP +currently supports the following ID types (that should be +prepended):

+
  metabprefixes <- getPrefixesFromAnalytes(db = rampDB, "metabolite")
+  geneprefixes <- getPrefixesFromAnalytes(db = rampDB, "gene")
+
+  datatable(rbind(metabprefixes, geneprefixes))
+
+ +

In this example, we will search for pathways that involve the two +genes MDM2 and TP53, and the two metabolites glutamate and +creatinine.

+
pathwaydfids <- getPathwayFromAnalyte(db = rampDB, c("ensembl:ENSG00000135679", "hmdb:HMDB0000064",
+        "hmdb:HMDB0000148", "ensembl:ENSG00000141510"))
+
## [1] "Starting getPathwayFromAnalyte()"
+## [1] "Working on ID List..."
+## [1] "finished getPathwayFromAnalyte()"
+## [1] "Found 866 associated pathways."
+
datatable(pathwaydfids)
+
+ +

Note that each row returns a pathway attributed to one of the input +analytes. To retrieve the number of unique pathways returned for all +analytes or each analyte, try the following:

+
print(paste("Number of Unique Pathways Returned for All Analytes:", 
+            length(unique(pathwaydfids$pathwayId))))
+
## [1] "Number of Unique Pathways Returned for All Analytes: 722"
+
lapply(unique(pathwaydfids$commonName), function(x) {
+        (paste("Number of Unique Pathways Returned for",x,":",
+                length(unique(pathwaydfids[which(pathwaydfids$commonName==x),]$pathwayId))))})
+
## [[1]]
+## [1] "Number of Unique Pathways Returned for MDM2 : 402"
+## 
+## [[2]]
+## [1] "Number of Unique Pathways Returned for TP53 : 214"
+## 
+## [[3]]
+## [1] "Number of Unique Pathways Returned for L-Glutamic acid,Glutamate : 238"
+## 
+## [[4]]
+## [1] "Number of Unique Pathways Returned for Creatine : 12"
+
+
+

Retrieve Metabolites from Metabolite Ontologies

+

Conversely, the user can retrieve the metabolites that are associated +with a specific ontology or vector of ontologies. We can accomplish this +using the function getMetaFromOnto(). It should be noted that it does +not matter which ontology the metabolites are from. The function will +return all metabolites associated with all the ontologies specified by +the user.

+
ontologies.of.interest <- c("Colon", "Liver", "Lung")
+
+new.metabolites <- RaMP::getMetaFromOnto(db = rampDB, ontology = ontologies.of.interest)
+
## [1] "Retreiving Metabolites for input ontology terms."
+## [1] "Found 3 ontology term matches."
+## [1] "Found 1482 metabolites associated with the input ontology terms."
+## [1] "Finished getting metabolies from ontology terms."
+
datatable(head(new.metabolites, n=10))
+
+ +
+
+

Retrieve Ontologies from Input Metabolites

+

RaMP contains information on where the metabolites originate from the +biospecimen. This information is called ontology. Here are all the +ontologies found in RaMP.

+

To retrieve ontologies that are associated with our metabolites we +can use getOntoFromMeta(). This function takes in a vector of +metabolites as an input and returns a vector comprised of the ontologies +from the user’s defined metabolites.

+
analytes.of.interest <- c("ensembl:ENSG00000135679", "hmdb:HMDB0000064",
+        "hmdb:HMDB0000148", "ensembl:ENSG00000141510")
+new.ontologies <- RaMP::getOntoFromMeta(db = rampDB, analytes = analytes.of.interest)
+datatable(new.ontologies)
+
+ +
+
+

Retrieve Analytes Involved in the Same Reaction

+

The user may want to know what gene transcripts encode enzymes which +can catalyze reactions involving metabolites in their experiment. RaMP +can return this data to its user.

+

We can return the gene transcripts using the rampFastCata() function. +To use it the user needs to provide a vector of metabolites they are +interested in and the connection information for MySQL. The user can +also input a vector of protein IDs or gene transcripts to return the +metabolites involved in chemical reactions with the input proteins or +gene transcript encoded proteins.

+
#Input Metabolites
+analytes.of.interest <- c("ensembl:ENSG00000135679", "hmdb:HMDB0000064",
+        "hmdb:HMDB0000148", "ensembl:ENSG00000141510")
+
+new.transcripts <- rampFastCata(db = rampDB, analytes = analytes.of.interest)
+
## [1] "Analyte ID-based reaction partner query."
+## [1] "Building metabolite to gene relations."
+## [1] "Number of met2gene relations: 100"
+## [1] "Building gene to metabolite relations."
+## [1] "Total Relation Count: 13062"
+
datatable(head(new.transcripts, n=10))
+
+ +
#Input Proteins
+proteins.of.interest <- c("uniprot:094808", "uniprot:Q99259")
+
+new.metabolites <- rampFastCata(db = rampDB, analytes = proteins.of.interest)
+
## [1] "Analyte ID-based reaction partner query."
+## [1] "Building metabolite to gene relations."
+## [1] "Number of met2gene relations: 0"
+## [1] "Building gene to metabolite relations."
+## [1] "Total Relation Count: 12"
+
datatable(head(new.metabolites, n=10))
+
+ +

RaMP has a built in function which is able to generate networks from +the transcript data. This function is named plotCataNetwork(). This +function uses the dataframe created by rampFastCata() as an input. These +plots are completely interactive.

+
plotCataNetwork(head(new.transcripts, n=100))
+
+ +
+
+

Retrieve Chemical Classes from Input Metabolites

+

RaMP incorporates Classfire and lipidMAPS classes. The function +chemicalClassSurvey() function takes as input a vector of metabolites +and outputs the classes associated with each metabolite input.

+
metabolites.of.interest = c('hmdb:HMDB0000056','hmdb:HMDB0000439','hmdb:HMDB0000479','hmdb:HMDB0000532',
+                            'hmdb:HMDB0001015','hmdb:HMDB0001138','hmdb:HMDB0029159','hmdb:HMDB0029412',
+                            'hmdb:HMDB0034365','hmdb:HMDB0035227','hmdb:HMDB0007973','hmdb:HMDB0008057',
+                            'hmdb:HMDB0011211')
+chemical.classes <- chemicalClassSurvey(db = rampDB, mets = metabolites.of.interest)
+
## [1] "Starting Chemical Class Survey"
+## [1] "...finished metabolite list query..."
+## [1] "...finished DB population query..."
+## [1] "...collating data..."
+## [1] "...creating query efficiency summary..."
+## [1] "Finished Chemical Class Survey"
+
metabolite.classes <- as.data.frame(chemical.classes$met_classes)
+datatable(metabolite.classes)
+
+ +
+
+

Retrieve Chemical Property Information from Input Metabolites

+

Chemical properties captured by RaMP include SMILES, InChI, +InChI-keys, monoisotopic masses, molecular formula, and common name. The +getChemicalProperties() function takes as input a vector of metabolites +and outputs a list of chemical property information that can easily be +converted into a dataframe.

+
chemical.properties <- getChemicalProperties(db = rampDB, metabolites.of.interest)
+
## Starting Chemical Property Query
+
## Finished Chemical Property Query
+
chemical.data <- chemical.properties$chem_props
+datatable(chemical.data)
+
+ +
+
+
+

Enrichment Analyses

+

RaMP performs pathway and chemical class overrespresentation analysis +using Fisher’s tests.

+
+

Perform Pathway Enrichment

+

Using the pathways that our analytes map to, captured in the +pathwaydfids data frame in the previous step, we can now run Fisher’s +Exact test to identify pathways that are enriched for our analytes of +interest:

+
fisher.results <- runCombinedFisherTest(db = rampDB, analytes = c(
+                                                  "hmdb:HMDB0000033",
+                                                  "hmdb:HMDB0000052",
+                                                  "hmdb:HMDB0000094",
+                                                  "hmdb:HMDB0000161",
+                                                  "hmdb:HMDB0000168",
+                                                  "hmdb:HMDB0000191",
+                                                  "hmdb:HMDB0000201",
+                                                  "chemspider:10026",
+                                                  "hmdb:HMDB0006059",
+                                                  "Chemspider:6405",
+                                                  "CAS:5657-19-2",
+                                                  "hmdb:HMDB0002511",
+                                                  "chemspider:20171375",
+                                                  "CAS:133-32-4",
+                                                  "CAS:5746-90-7",
+                                                  "CAS:477251-67-5",
+                                                  "hmdb:HMDB0000695",
+                                                  "chebi:15934",
+                                                  "CAS:838-07-3",
+                                                  "hmdb:HMDBP00789",
+                                                  "hmdb:HMDBP00283",
+                                                  "hmdb:HMDBP00284",
+                                                  "hmdb:HMDBP00850"
+))
+

Note: To explicitly view the results of mapping input IDs to +RaMP, users can run the getPathwayFromAnalyte() function as noted in +above in the section “Retrieve Pathways From Input Analyte(s)”.

+

Once we have our fisher results we can format them into a new +dataframe and filter the pathways for significance. For this example we +will be using an FDR p-value cutoff of 0.05.

+
#Returning Fisher Pathways and P-Values
+filtered.fisher.results <- FilterFishersResults(fisher.results, pval_type = 'holm', pval_cutoff=0.05)
+
## [1] "Filtering Fisher Results..."
+## [1] "Fisher Result Type: Pathway Enrichment"
+

Because RaMP combines pathways from multiple sources, pathways may be +represented more than once. Further, due to the hierarchical nature of +pathways and because Fisher’s testing assumes pathways are independent, +subpathways and their parent pathways may appear in a list. To help +group together pathways that represent similar biological processes, we +have implemented a clustering algorithm that groups pathways together if +they share analytes in common.

+
clusters <- RaMP::findCluster(db = rampDB, filtered.fisher.results,
+  perc_analyte_overlap = 0.2,
+  min_pathway_tocluster = 2, perc_pathway_overlap = 0.2
+)
+
## [1] "Clustering pathways..."
+## [1] "Finished clustering pathways..."
+
## print("Pathways with Holm-adjusted Pval < 0.05")
+
+datatable(clusters$fishresults %>% mutate_if(is.numeric, ~ round(., 8)),
+  rownames = FALSE
+)
+
+ +

To view clustered pathway results:

+
pathwayResultsPlot(db = rampDB, filtered.fisher.results, text_size = 8, perc_analyte_overlap = 0.2, 
+    min_pathway_tocluster = 2, perc_pathway_overlap = 0.2, interactive = FALSE)
+
## [1] "Clustering pathways..."
+## [1] "Finished clustering pathways..."
+

+
+
+

Perform Chemical Enrichment

+

After retrieving chemical classes of metabolites, the function +chemicalClassEnrichment() function will perform overrepresentation +analysis using a Fisher’s test and output classes that show enrichment +in the user input list of metabolites relative to the backgroud +metabolite population (all meteabolites in RaMP). The function performs +enrichment analysis for Classyfire classes, sub-classess, and +super-classes, and for LipidMaps categories, main classess, and sub +classes.

+
metabolites.of.interest = c('hmdb:HMDB0000056','hmdb:HMDB0000439','hmdb:HMDB0000479','hmdb:HMDB0000532',
+                            'hmdb:HMDB0001015','hmdb:HMDB0001138','hmdb:HMDB0029159','hmdb:HMDB0029412',
+                            'hmdb:HMDB0034365','hmdb:HMDB0035227','hmdb:HMDB0007973','hmdb:HMDB0008057',
+                            'hmdb:HMDB0011211')
+chemical.enrichment <- chemicalClassEnrichment(db = rampDB, mets = metabolites.of.interest)
+
## [1] "Starting Chemical Class Enrichment"
+## [1] "Starting Chemical Class Survey"
+## [1] "...finished metabolite list query..."
+## [1] "...finished DB population query..."
+## [1] "...collating data..."
+## [1] "...creating query efficiency summary..."
+## [1] "Finished Chemical Class Survey"
+## [1] "check total summary"
+## [1] "getting population totals"
+## [1] "Finished Chemical Class Enrichment"
+
# Enrichment was performed on the following chemical classes:
+names(chemical.enrichment)
+
## [1] "ClassyFire_class"       "ClassyFire_sub_class"   "ClassyFire_super_class"
+## [4] "result_type"
+
# To retrieve results for the ClassyFire Class:
+classy_fire_classes <- chemical.enrichment$ClassyFire_class
+datatable(classy_fire_classes)
+
+ +

Note: To explicitly view the results of mapping input IDs to +RaMP, users can run the chemicalClassSurvey() function as noted in above +in the section “Retrieve Chemical Class from Input Metabolites”.

+

This code section demonstrates a Rhea reaction query.

+
analytes.of.interest <- c('chebi:57368', 'uniprot:Q96N66', 'CHEBI:73003')
+
+reactionsLists <- RaMP::getReactionsForAnalytes(db = rampDB, analytes = analytes.of.interest, includeTransportRxns = F, humanProtein = T)
+
## [1] "Retrieving reactions for compounds"
+## [1] "Retrieving reactions for genes/proteins"
+
# just show the reactions with at least one metabolite and one protein in commmon.
+datatable(subset(reactionsLists$metProteinCommonReactions, select = -c(rxn_html_label)))
+
+ +

Three reaction lists are returned, metabolites-to-reactions, +proteins-to-reactions, and reactions that have at least one metaboite +and one protein from the input analyte list.

+
sessionInfo()
+
## R version 4.1.0 (2021-05-18)
+## Platform: x86_64-w64-mingw32/x64 (64-bit)
+## Running under: Windows 10 x64 (build 22621)
+## 
+## Matrix products: default
+## 
+## locale:
+## [1] LC_COLLATE=English_United States.1252 
+## [2] LC_CTYPE=English_United States.1252   
+## [3] LC_MONETARY=English_United States.1252
+## [4] LC_NUMERIC=C                          
+## [5] LC_TIME=English_United States.1252    
+## 
+## attached base packages:
+## [1] stats     graphics  grDevices utils     datasets  methods   base     
+## 
+## other attached packages:
+## [1] magrittr_2.0.2 dplyr_1.1.2    DT_0.28        RaMP_3.0.0    
+## 
+## loaded via a namespace (and not attached):
+##  [1] Rcpp_1.0.8.3        lattice_0.20-45     tidyr_1.3.0        
+##  [4] visNetwork_2.1.2    assertthat_0.2.1    digest_0.6.29      
+##  [7] utf8_1.2.2          BiocFileCache_2.0.0 R6_2.5.1           
+## [10] RSQLite_2.3.1       evaluate_0.21       highr_0.10         
+## [13] httr_1.4.7          ggplot2_3.4.3       pillar_1.9.0       
+## [16] rlang_1.1.0         curl_4.3.2          rstudioapi_0.13    
+## [19] data.table_1.14.8   jquerylib_0.1.4     blob_1.2.4         
+## [22] R.utils_2.12.2      R.oo_1.24.0         Matrix_1.4-1       
+## [25] rmarkdown_2.24      labeling_0.4.2      tidytext_0.4.1     
+## [28] htmlwidgets_1.6.2   bit_4.0.4           munsell_0.5.0      
+## [31] compiler_4.1.0      janeaustenr_1.0.0   xfun_0.40          
+## [34] pkgconfig_2.0.3     htmltools_0.5.6     tidyselect_1.2.0   
+## [37] tibble_3.2.1        fansi_1.0.3         dbplyr_2.1.1       
+## [40] withr_2.5.0         R.methodsS3_1.8.1   rappdirs_0.3.3     
+## [43] SnowballC_0.7.1     grid_4.1.0          jsonlite_1.8.7     
+## [46] gtable_0.3.4        lifecycle_1.0.3     DBI_1.1.3          
+## [49] scales_1.2.1        tokenizers_0.3.0    cli_3.6.1          
+## [52] stringi_1.7.6       cachem_1.0.6        farver_2.1.1       
+## [55] bslib_0.4.0         ellipsis_0.3.2      filelock_1.0.2     
+## [58] generics_0.1.3      vctrs_0.6.3         tools_4.1.0        
+## [61] bit64_4.0.5         glue_1.6.2          purrr_1.0.1        
+## [64] crosstalk_1.2.0     fastmap_1.1.0       yaml_2.3.7         
+## [67] colorspace_2.1-0    memoise_2.0.1       knitr_1.43         
+## [70] sass_0.4.2
+
+
+ + + + +
+ + + + + + + + + + + + + + + +