forked from YuLab-SMU/biomedical-knowledge-mining-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
10_DO_enrichment.Rmd
119 lines (83 loc) · 5.21 KB
/
10_DO_enrichment.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# Disease enrichment analysis {#dose-enrichment}
We developed `r Biocpkg("DOSE")` [@yu_dose_2015] package to promote the investigation of diseases. `r Biocpkg("DOSE")` provides five methods for [measuring semantic similarities among DO terms and gene products](#DOSE-semantic-similarity), hypergeometric model and gene set enrichment analysis (GSEA) for associating disease with gene list and extracting disease association insight from genome wide expression profiles.
```{r include=FALSE}
library(knitr)
opts_chunk$set(message=FALSE, warning=FALSE, eval=TRUE, echo=TRUE, cache=TRUE)
library(clusterProfiler)
```
## Disease over-representation analysis {#dose-ora}
`r Biocpkg("DOSE")` supports enrichment analysis of Disease Ontology (DO) [@schriml_disease_2011], [Network of Cancer Gene](http://ncg.kcl.ac.uk/) [@omer_ncg] and [Disease Gene Network](http://disgenet.org/) (DisGeNET) [@janet_disgenet]. In addition, several visualization methods were provided by `r Biocpkg("enrichplot")` to help interpreting semantic and enrichment results.
### Over-representation analysis for disease ontology {#dose-do-ora}
In the following example, we selected fold change above 1.5 as the differential genes and analyzing their disease association.
```{r enrichdo}
library(DOSE)
data(geneList)
gene <- names(geneList)[abs(geneList) > 1.5]
head(gene)
x <- enrichDO(gene = gene,
ont = "DO",
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
universe = names(geneList),
minGSSize = 5,
maxGSSize = 500,
qvalueCutoff = 0.05,
readable = FALSE)
head(x)
```
The `enrichDO()` function requires an entrezgene ID vector as input, mostly is the differential gene list of gene expression profile studies. Please refer to [session 16.1](#id-convert) if you need to conver other gene ID types to entrezgene ID.
The `ont` parameter can be "DO" or "DOLite", DOLite [@Du15062009] was constructed to aggregate the redundant DO terms. The DOLite data is not updated, we recommend user use `ont="DO"`. `pvalueCutoff` setting the cutoff value of *p* value and adjusted *p* value; `pAdjustMethod` setting the *p* value correction methods, include the Bonferroni correction ("bonferroni"), Holm ("holm"), Hochberg ("hochberg"), Hommel ("hommel"), Benjamini \& Hochberg ("BH") and Benjamini \& Yekutieli ("BY") while `qvalueCutoff` is used to control *q*-values.
The `universe` setting the background gene universe for testing. If user do not explicitly setting this parameter, `enrichDO()` will set the universe to all human genes that have DO annotation.
The `minGSSize` (and `maxGSSize`) indicates that only those DO terms that have more than `minGSSize` (and less than `maxGSSize`) genes annotated will be tested.
The `readable` is a logical parameter, indicates whether the entrezgene IDs will mapping to gene symbols or not (see also [setReadable](#setReadable)).
### Over-representation analysis for the network of cancer gene {#dose-ncg-ora}
[Network of Cancer Gene](http://ncg.kcl.ac.uk/) (NCG) [@omer_ncg] is a manually curated repository of cancer genes. NCG release 5.0 (Aug. 2015) collects 1,571 cancer genes from 175 published studies. `r Biocpkg("DOSE")` supports analyzing gene list and determine whether they are enriched in genes known to be mutated in a given cancer type.
```{r enrichncg}
gene2 <- names(geneList)[abs(geneList) < 3]
ncg <- enrichNCG(gene2)
head(ncg)
```
### Over-representation analysis for the disease gene network {#dose-dgn-ora}
[DisGeNET](http://disgenet.org/)[@janet_disgenet] is an integrative and comprehensive resources of gene-disease associations from several public data sources and the literature. It contains gene-disease associations and snp-gene-disease associations.
The enrichment analysis of disease-gene associations is supported by the `enrichDGN` function and analysis of snp-gene-disease associations is supported by the `enrichDGNv` function.
```{r enrichdgn}
dgn <- enrichDGN(gene)
head(dgn)
snp <- c("rs1401296", "rs9315050", "rs5498", "rs1524668", "rs147377392",
"rs841", "rs909253", "rs7193343", "rs3918232", "rs3760396",
"rs2231137", "rs10947803", "rs17222919", "rs386602276", "rs11053646",
"rs1805192", "rs139564723", "rs2230806", "rs20417", "rs966221")
dgnv <- enrichDGNv(snp)
head(dgnv)
```
## Disease gene set enrichment analysis {#dose-gsea}
### `gseDO` fuction
In the following example, in order to speedup the compilation of this document, only gene sets with size above 120 were tested and only 100 permutations were performed.
```{r gsedo}
library(DOSE)
data(geneList)
y <- gseDO(geneList,
minGSSize = 120,
pvalueCutoff = 0.2,
pAdjustMethod = "BH",
verbose = FALSE)
head(y, 3)
```
### `gseNCG` fuction
```{r gsencg}
ncg <- gseNCG(geneList,
pvalueCutoff = 0.5,
pAdjustMethod = "BH",
verbose = FALSE)
ncg <- setReadable(ncg, 'org.Hs.eg.db')
head(ncg, 3)
```
### `gseDGN` fuction
```{r gsedgn}
dgn <- gseDGN(geneList,
pvalueCutoff = 0.2,
pAdjustMethod = "BH",
verbose = FALSE)
dgn <- setReadable(dgn, 'org.Hs.eg.db')
head(dgn, 3)
```