generated from microbiome/course_2022_oulu
-
Notifications
You must be signed in to change notification settings - Fork 1
/
06-3-ex-sol-ADHD.Rmd
182 lines (146 loc) · 5.8 KB
/
06-3-ex-sol-ADHD.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
---
output:
html_document: default
pdf_document: default
---
* **Importing and processing data**
```{r, message=FALSE}
# Defining file paths
biom_file_path <- "data/Aggregated_humanization2.biom"
sample_meta_file_path <- "data/Mapping_file_ADHD_aggregated.csv"
tree_file_path <- "data/Data_humanization_phylo_aggregation.tre"
library(mia)
# Imports the data
se <- loadFromBiom(biom_file_path)
names(rowData(se)) <- c("Kingdom", "Phylum", "Class", "Order",
"Family", "Genus")
# Goes through the whole DataFrame. Removes '.*[kpcofg]__' from strings, where [kpcofg]
# is any character from listed ones, and .* any character.
rowdata_modified <- BiocParallel::bplapply(rowData(se),
FUN = stringr::str_remove,
pattern = '.*[kpcofg]__')
# Genus level has additional '\"', so let's delete that also
rowdata_modified <- BiocParallel::bplapply(rowdata_modified,
FUN = stringr::str_remove,
pattern = '\"')
# rowdata_modified is a list, so it is converted back to DataFrame format.
rowdata_modified <- DataFrame(rowdata_modified)
# And then assigned back to the SE object
rowData(se) <- rowdata_modified
# We use this to check what type of data it is
# read.table(sample_meta_file_path)
# It seems like a comma separated file and it does not include headers
# Let us read it and then convert from data.frame to DataFrame
# (required for our purposes)
sample_meta <- DataFrame(read.table(sample_meta_file_path, sep = ",", header = FALSE))
# Add sample names to rownames
rownames(sample_meta) <- sample_meta[,1]
# Delete column that included sample names
sample_meta[,1] <- NULL
# We can add headers
colnames(sample_meta) <- c("patient_status", "cohort", "patient_status_vs_cohort", "sample_name")
# Then it can be added to colData
colData(se) <- sample_meta
# Convert to tse format
tse <- as(se, "TreeSummarizedExperiment")
# Reads the tree file
tree <- ape::read.tree(tree_file_path)
# Add tree to rowTree
rowTree(tse) <- tree
```
* **Abundance table** Retrieve the taxonomic abundance table from the
example data set (TSE object).
```{r}
# We show only part of it
assays(tse)$counts[1:6,1:6]
```
* How many different samples and genus-level groups this phyloseq
object has?
```{r}
dim(rowData(tse))
```
* What is the maximum abundance of Akkermansia in this data set?
```{r}
# Agglomerating to Genus level
tse_genus <- agglomerateByRank(tse,rank="Genus")
# Retrieving the count for Akkermansia
Akkermansia_abund <- assays(tse_genus)$count["Genus:Akkermansia",]
max(Akkermansia_abund)
```
* Draw a histogram of library sizes (total number of reads per
sample).
```{r}
library(scater)
tse <- addPerCellQC(tse) # adding a new column "sum" to colData, of total counts/sample.
hist(colData(tse)$sum, xlab = "Total number of reads per sample", main = "Histogram of library sizes")
```
* **Taxonomy table** Retrieve the taxonomy table and print out the
first few lines of it with the R command head(). Investigate how
many different phylum-level groups this phyloseq object has?
```{r}
head(rowData(tse))
taxonomyRanks(tse) # The taxonomic ranks available
unique(rowData(tse)["Phylum"]) # phylum-level groups
length(unique(rowData(tse)["Phylum"])[,1]) # number of phylum-level groups
```
* **Sample metadata** Retrieve sample metadata. How many patient
groups this data set has? Draw a histogram of sample
diversities.
```{r}
colData(tse) # samples metadata
unique(colData(tse)$patient_status) # patient groups
# Example of sample diversity using Shannon index
tse <- mia::estimateDiversity(tse,
abund_values = "counts",
index = "shannon",
name = "shannon")
hist(colData(tse)$shannon, xlab = "Shannon index", main = "Histogram of sample diversity")
```
* **Subsetting** Pick a subset of the data object including only
ADHD individuals from Cohort 1. How many there are?
```{r}
sub_cohort_1 <- tse[, colData(tse)$cohort=="Cohort_1"]
sub_cohort_1_ADHD <- sub_cohort_1[, colData(sub_cohort_1)$patient_status=="ADHD"]
colData(sub_cohort_1_ADHD)
```
* **Transformations** The data contains read counts. We can convert
these into relative abundances and other formats. Compare abundance
of a given taxonomic group using the example data before and after
the compositionality transformation (with a cross-plot, for
instance). You can also compare the results to CLR-transformed data
(see e.g. [Gloor et
al. 2017](https://www.frontiersin.org/articles/10.3389/fmicb.2017.02224/full))
```{r}
tse <- transformCounts(tse, method = "relabundance")
tse <- transformCounts(tse, method = "clr", abund_values = "counts",pseudocount = 1)
# Lets compare with taxa: A29
taxa <- "A29"
df <- as.data.frame(list(
counts=assays(tse)$counts[,taxa],
relabundance=assays(tse)$relabundance[,taxa],
clr=assays(tse)$clr[,taxa])
)
ggplot(df, aes(x=counts,y=relabundance))+
geom_point()+
geom_smooth()
ggplot(df, aes(x=counts,y=clr))+
geom_point()+
geom_smooth()
ggplot(df, aes(x=relabundance,y=clr))+
geom_point()+
geom_smooth()
```
* **Visual exploration** Visualize the population distribution of
abundances for certain taxonomic groups. Do the same for
CLR-transformed abundances.
``` {r}
# Same taxa used as earlier
ggplot(df, aes(x=counts,colour="blue")) +
geom_density(alpha=.2)+
theme(legend.position = "none")+
labs(title =paste("Distribution of counts for",taxa, collapse = ": "))
ggplot(df, aes(x=clr,colour="red")) +
geom_density(alpha=.2)+
theme(legend.position = "none")+
labs(title =paste("Distribution of clr for",taxa, collapse = ": "))
```