diff --git a/RaMP_v3.0_SQLite_Vignette.Rmd b/RaMP_v3.0_SQLite_Vignette.Rmd index 54cb29b0..894b71d8 100644 --- a/RaMP_v3.0_SQLite_Vignette.Rmd +++ b/RaMP_v3.0_SQLite_Vignette.Rmd @@ -48,6 +48,8 @@ library(magrittr) RaMP::listAvailableRaMPDbVersions() # load a local RaMP database or download the latest RaMP database version from the repository. +# If the version is not specified, the latest local version will be used. +# If there are not local database cached, then the latest remote version will be downloaded. rampDB <- RaMP() ``` diff --git a/RaMP_v3.0_SQLite_Vignette.html b/RaMP_v3.0_SQLite_Vignette.html new file mode 100644 index 00000000..89f99577 --- /dev/null +++ b/RaMP_v3.0_SQLite_Vignette.html @@ -0,0 +1,7916 @@ + + + + +
+ + + + + + + + + + +This vignette will provide basic steps for interacting with RaMP-DB
+(Relational database of Metabolomic Pathways).
+The codebase for RaMP-DB is available on our GitHub site,
+sqlite branch. Details on RaMP-DB installation are also avaialble
+through GitHub, and questions can be asked through the Issues tab or by
+sending an email to NCATSRaMP@nih.gov.
RaMP-DB supports queries and enrichment analyses. Supported queries +are:
+Supported enrichment analyses are:
+Once installed, first load the package. The first call is to list +available database version within your local file cache and in our +remote repository. Initialize RaMP database object. This method will +reference a RaMP DB version in local file cache for your current +session, or will download the latest version of the RaMP database. Note +that this RaMP() method can accept a version argument with a format +like, version=‘2.3.2’, for instance. The supplied version should be one +of the versions shown after listing available versions.
+library(RaMP)
+library(DT) # for prettier tables in vignette
+library(dplyr)
+library(magrittr)
+
+RaMP::listAvailableRaMPDbVersions()
## [1] "Locally available versions of RaMP SQLite DB, currently on your computer:"
+## [1] "No local versions of the RaMP Database were found."
+## [1] "Please use the command 'db <- RaMP()' to download the latest version into local file cache."
+## [1] "Alternatively you can use the command db <- RaMP(version = <remote_version_number>) using one of the versions listed below."
+## [1] "Available remote RaMP SQLite DB versions for download:"
+## [1] "2.3.2" "2.3.1"
+
+Analytes (genes, proteins, metabolites) can be retrieve by pathway. +Users have to input the exact pathway name. Here is an example:
+ +## [1] "fired!"
+## [1] "Timing .."
+## user system elapsed
+## 0.20 0.06 1.47
+
+
+
+To retrieve information from multiple pathways, input a vector of +pathway names:
+myanalytes <- getAnalyteFromPathway(db = rampDB, pathway=c("De Novo Triacylglycerol Biosynthesis",
+ "sphingolipid metabolism"))
## [1] "fired!"
+## [1] "Timing .."
+## user system elapsed
+## 0.17 0.15 0.93
+It is oftentimes useful to get a sense of what pathways are +represented in a dataset (this is particularly true for metabolomics, +where coverage of metabolites varies depending on what platform is +used). In other cases, one may be interested in exploring one or several +metabolites to see what pathways they are arepresented in.
+Note that it is always preferable to utilize IDs rather then common +names. When entering IDs, prepend each ID with the database of origin +followed by a colon, for example kegg:C02712, hmdb:HMDB04824, etc.. It +is possible to input IDs using multiple different sources. RaMP +currently supports the following ID types (that should be +prepended):
+ metabprefixes <- getPrefixesFromAnalytes(db = rampDB, "metabolite")
+ geneprefixes <- getPrefixesFromAnalytes(db = rampDB, "gene")
+
+ datatable(rbind(metabprefixes, geneprefixes))
In this example, we will search for pathways that involve the two +genes MDM2 and TP53, and the two metabolites glutamate and +creatinine.
+pathwaydfids <- getPathwayFromAnalyte(db = rampDB, c("ensembl:ENSG00000135679", "hmdb:HMDB0000064",
+ "hmdb:HMDB0000148", "ensembl:ENSG00000141510"))
## [1] "Starting getPathwayFromAnalyte()"
+## [1] "Working on ID List..."
+## [1] "finished getPathwayFromAnalyte()"
+## [1] "Found 866 associated pathways."
+
+
+
+Note that each row returns a pathway attributed to one of the input +analytes. To retrieve the number of unique pathways returned for all +analytes or each analyte, try the following:
+print(paste("Number of Unique Pathways Returned for All Analytes:",
+ length(unique(pathwaydfids$pathwayId))))
## [1] "Number of Unique Pathways Returned for All Analytes: 722"
+lapply(unique(pathwaydfids$commonName), function(x) {
+ (paste("Number of Unique Pathways Returned for",x,":",
+ length(unique(pathwaydfids[which(pathwaydfids$commonName==x),]$pathwayId))))})
## [[1]]
+## [1] "Number of Unique Pathways Returned for MDM2 : 402"
+##
+## [[2]]
+## [1] "Number of Unique Pathways Returned for TP53 : 214"
+##
+## [[3]]
+## [1] "Number of Unique Pathways Returned for L-Glutamic acid,Glutamate : 238"
+##
+## [[4]]
+## [1] "Number of Unique Pathways Returned for Creatine : 12"
+Conversely, the user can retrieve the metabolites that are associated +with a specific ontology or vector of ontologies. We can accomplish this +using the function getMetaFromOnto(). It should be noted that it does +not matter which ontology the metabolites are from. The function will +return all metabolites associated with all the ontologies specified by +the user.
+ontologies.of.interest <- c("Colon", "Liver", "Lung")
+
+new.metabolites <- RaMP::getMetaFromOnto(db = rampDB, ontology = ontologies.of.interest)
## [1] "Retreiving Metabolites for input ontology terms."
+## [1] "Found 3 ontology term matches."
+## [1] "Found 1482 metabolites associated with the input ontology terms."
+## [1] "Finished getting metabolies from ontology terms."
+
+
+
+RaMP contains information on where the metabolites originate from the +biospecimen. This information is called ontology. Here are all the +ontologies found in RaMP.
+To retrieve ontologies that are associated with our metabolites we +can use getOntoFromMeta(). This function takes in a vector of +metabolites as an input and returns a vector comprised of the ontologies +from the user’s defined metabolites.
+ + + +The user may want to know what gene transcripts encode enzymes which +can catalyze reactions involving metabolites in their experiment. RaMP +can return this data to its user.
+We can return the gene transcripts using the rampFastCata() function. +To use it the user needs to provide a vector of metabolites they are +interested in and the connection information for MySQL. The user can +also input a vector of protein IDs or gene transcripts to return the +metabolites involved in chemical reactions with the input proteins or +gene transcript encoded proteins.
+#Input Metabolites
+analytes.of.interest <- c("ensembl:ENSG00000135679", "hmdb:HMDB0000064",
+ "hmdb:HMDB0000148", "ensembl:ENSG00000141510")
+
+new.transcripts <- rampFastCata(db = rampDB, analytes = analytes.of.interest)
## [1] "Analyte ID-based reaction partner query."
+## [1] "Building metabolite to gene relations."
+## [1] "Number of met2gene relations: 100"
+## [1] "Building gene to metabolite relations."
+## [1] "Total Relation Count: 13062"
+
+
+
+#Input Proteins
+proteins.of.interest <- c("uniprot:094808", "uniprot:Q99259")
+
+new.metabolites <- rampFastCata(db = rampDB, analytes = proteins.of.interest)
## [1] "Analyte ID-based reaction partner query."
+## [1] "Building metabolite to gene relations."
+## [1] "Number of met2gene relations: 0"
+## [1] "Building gene to metabolite relations."
+## [1] "Total Relation Count: 12"
+
+
+
+RaMP has a built in function which is able to generate networks from +the transcript data. This function is named plotCataNetwork(). This +function uses the dataframe created by rampFastCata() as an input. These +plots are completely interactive.
+ + + +RaMP incorporates Classfire and lipidMAPS classes. The function +chemicalClassSurvey() function takes as input a vector of metabolites +and outputs the classes associated with each metabolite input.
+metabolites.of.interest = c('hmdb:HMDB0000056','hmdb:HMDB0000439','hmdb:HMDB0000479','hmdb:HMDB0000532',
+ 'hmdb:HMDB0001015','hmdb:HMDB0001138','hmdb:HMDB0029159','hmdb:HMDB0029412',
+ 'hmdb:HMDB0034365','hmdb:HMDB0035227','hmdb:HMDB0007973','hmdb:HMDB0008057',
+ 'hmdb:HMDB0011211')
+chemical.classes <- chemicalClassSurvey(db = rampDB, mets = metabolites.of.interest)
## [1] "Starting Chemical Class Survey"
+## [1] "...finished metabolite list query..."
+## [1] "...finished DB population query..."
+## [1] "...collating data..."
+## [1] "...creating query efficiency summary..."
+## [1] "Finished Chemical Class Survey"
+
+
+
+Chemical properties captured by RaMP include SMILES, InChI, +InChI-keys, monoisotopic masses, molecular formula, and common name. The +getChemicalProperties() function takes as input a vector of metabolites +and outputs a list of chemical property information that can easily be +converted into a dataframe.
+ +## Starting Chemical Property Query
+## Finished Chemical Property Query
+
+
+
+RaMP performs pathway and chemical class overrespresentation analysis +using Fisher’s tests.
+Using the pathways that our analytes map to, captured in the +pathwaydfids data frame in the previous step, we can now run Fisher’s +Exact test to identify pathways that are enriched for our analytes of +interest:
+fisher.results <- runCombinedFisherTest(db = rampDB, analytes = c(
+ "hmdb:HMDB0000033",
+ "hmdb:HMDB0000052",
+ "hmdb:HMDB0000094",
+ "hmdb:HMDB0000161",
+ "hmdb:HMDB0000168",
+ "hmdb:HMDB0000191",
+ "hmdb:HMDB0000201",
+ "chemspider:10026",
+ "hmdb:HMDB0006059",
+ "Chemspider:6405",
+ "CAS:5657-19-2",
+ "hmdb:HMDB0002511",
+ "chemspider:20171375",
+ "CAS:133-32-4",
+ "CAS:5746-90-7",
+ "CAS:477251-67-5",
+ "hmdb:HMDB0000695",
+ "chebi:15934",
+ "CAS:838-07-3",
+ "hmdb:HMDBP00789",
+ "hmdb:HMDBP00283",
+ "hmdb:HMDBP00284",
+ "hmdb:HMDBP00850"
+))
Note: To explicitly view the results of mapping input IDs to +RaMP, users can run the getPathwayFromAnalyte() function as noted in +above in the section “Retrieve Pathways From Input Analyte(s)”.
+Once we have our fisher results we can format them into a new +dataframe and filter the pathways for significance. For this example we +will be using an FDR p-value cutoff of 0.05.
+#Returning Fisher Pathways and P-Values
+filtered.fisher.results <- FilterFishersResults(fisher.results, pval_type = 'holm', pval_cutoff=0.05)
## [1] "Filtering Fisher Results..."
+## [1] "Fisher Result Type: Pathway Enrichment"
+Because RaMP combines pathways from multiple sources, pathways may be +represented more than once. Further, due to the hierarchical nature of +pathways and because Fisher’s testing assumes pathways are independent, +subpathways and their parent pathways may appear in a list. To help +group together pathways that represent similar biological processes, we +have implemented a clustering algorithm that groups pathways together if +they share analytes in common.
+clusters <- RaMP::findCluster(db = rampDB, filtered.fisher.results,
+ perc_analyte_overlap = 0.2,
+ min_pathway_tocluster = 2, perc_pathway_overlap = 0.2
+)
## [1] "Clustering pathways..."
+## [1] "Finished clustering pathways..."
+## print("Pathways with Holm-adjusted Pval < 0.05")
+
+datatable(clusters$fishresults %>% mutate_if(is.numeric, ~ round(., 8)),
+ rownames = FALSE
+)
To view clustered pathway results:
+pathwayResultsPlot(db = rampDB, filtered.fisher.results, text_size = 8, perc_analyte_overlap = 0.2,
+ min_pathway_tocluster = 2, perc_pathway_overlap = 0.2, interactive = FALSE)
## [1] "Clustering pathways..."
+## [1] "Finished clustering pathways..."
+
+After retrieving chemical classes of metabolites, the function +chemicalClassEnrichment() function will perform overrepresentation +analysis using a Fisher’s test and output classes that show enrichment +in the user input list of metabolites relative to the backgroud +metabolite population (all meteabolites in RaMP). The function performs +enrichment analysis for Classyfire classes, sub-classess, and +super-classes, and for LipidMaps categories, main classess, and sub +classes.
+metabolites.of.interest = c('hmdb:HMDB0000056','hmdb:HMDB0000439','hmdb:HMDB0000479','hmdb:HMDB0000532',
+ 'hmdb:HMDB0001015','hmdb:HMDB0001138','hmdb:HMDB0029159','hmdb:HMDB0029412',
+ 'hmdb:HMDB0034365','hmdb:HMDB0035227','hmdb:HMDB0007973','hmdb:HMDB0008057',
+ 'hmdb:HMDB0011211')
+chemical.enrichment <- chemicalClassEnrichment(db = rampDB, mets = metabolites.of.interest)
## [1] "Starting Chemical Class Enrichment"
+## [1] "Starting Chemical Class Survey"
+## [1] "...finished metabolite list query..."
+## [1] "...finished DB population query..."
+## [1] "...collating data..."
+## [1] "...creating query efficiency summary..."
+## [1] "Finished Chemical Class Survey"
+## [1] "check total summary"
+## [1] "getting population totals"
+## [1] "Finished Chemical Class Enrichment"
+
+## [1] "ClassyFire_class" "ClassyFire_sub_class" "ClassyFire_super_class"
+## [4] "result_type"
+# To retrieve results for the ClassyFire Class:
+classy_fire_classes <- chemical.enrichment$ClassyFire_class
+datatable(classy_fire_classes)
Note: To explicitly view the results of mapping input IDs to +RaMP, users can run the chemicalClassSurvey() function as noted in above +in the section “Retrieve Chemical Class from Input Metabolites”.
+This code section demonstrates a Rhea reaction query.
+analytes.of.interest <- c('chebi:57368', 'uniprot:Q96N66', 'CHEBI:73003')
+
+reactionsLists <- RaMP::getReactionsForAnalytes(db = rampDB, analytes = analytes.of.interest, includeTransportRxns = F, humanProtein = T)
## [1] "Retrieving reactions for compounds"
+## [1] "Retrieving reactions for genes/proteins"
+# just show the reactions with at least one metabolite and one protein in commmon.
+datatable(subset(reactionsLists$metProteinCommonReactions, select = -c(rxn_html_label)))
Three reaction lists are returned, metabolites-to-reactions, +proteins-to-reactions, and reactions that have at least one metaboite +and one protein from the input analyte list.
+ +## R version 4.1.0 (2021-05-18)
+## Platform: x86_64-w64-mingw32/x64 (64-bit)
+## Running under: Windows 10 x64 (build 22621)
+##
+## Matrix products: default
+##
+## locale:
+## [1] LC_COLLATE=English_United States.1252
+## [2] LC_CTYPE=English_United States.1252
+## [3] LC_MONETARY=English_United States.1252
+## [4] LC_NUMERIC=C
+## [5] LC_TIME=English_United States.1252
+##
+## attached base packages:
+## [1] stats graphics grDevices utils datasets methods base
+##
+## other attached packages:
+## [1] magrittr_2.0.2 dplyr_1.1.2 DT_0.28 RaMP_3.0.0
+##
+## loaded via a namespace (and not attached):
+## [1] Rcpp_1.0.8.3 lattice_0.20-45 tidyr_1.3.0
+## [4] visNetwork_2.1.2 assertthat_0.2.1 digest_0.6.29
+## [7] utf8_1.2.2 BiocFileCache_2.0.0 R6_2.5.1
+## [10] RSQLite_2.3.1 evaluate_0.21 highr_0.10
+## [13] httr_1.4.7 ggplot2_3.4.3 pillar_1.9.0
+## [16] rlang_1.1.0 curl_4.3.2 rstudioapi_0.13
+## [19] data.table_1.14.8 jquerylib_0.1.4 blob_1.2.4
+## [22] R.utils_2.12.2 R.oo_1.24.0 Matrix_1.4-1
+## [25] rmarkdown_2.24 labeling_0.4.2 tidytext_0.4.1
+## [28] htmlwidgets_1.6.2 bit_4.0.4 munsell_0.5.0
+## [31] compiler_4.1.0 janeaustenr_1.0.0 xfun_0.40
+## [34] pkgconfig_2.0.3 htmltools_0.5.6 tidyselect_1.2.0
+## [37] tibble_3.2.1 fansi_1.0.3 dbplyr_2.1.1
+## [40] withr_2.5.0 R.methodsS3_1.8.1 rappdirs_0.3.3
+## [43] SnowballC_0.7.1 grid_4.1.0 jsonlite_1.8.7
+## [46] gtable_0.3.4 lifecycle_1.0.3 DBI_1.1.3
+## [49] scales_1.2.1 tokenizers_0.3.0 cli_3.6.1
+## [52] stringi_1.7.6 cachem_1.0.6 farver_2.1.1
+## [55] bslib_0.4.0 ellipsis_0.3.2 filelock_1.0.2
+## [58] generics_0.1.3 vctrs_0.6.3 tools_4.1.0
+## [61] bit64_4.0.5 glue_1.6.2 purrr_1.0.1
+## [64] crosstalk_1.2.0 fastmap_1.1.0 yaml_2.3.7
+## [67] colorspace_2.1-0 memoise_2.0.1 knitr_1.43
+## [70] sass_0.4.2
+