-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathComputer_lab_II.Rmd
153 lines (125 loc) · 5.21 KB
/
Computer_lab_II.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
title: " Community ecology - Computer lab II - AB332"
author: "Anders K. Krabberød (UiO) and Ramiro Logares (ICM)"
date: "October 2024"
output:
html_notebook: default
pdf_document: default
html_document:
df_print: paged
editor_options:
chunk_output_type: inline
---
# Computer lab II
First load the necessary packages.
```{r,include=FALSE}
library(vegan)
library(tidyverse)
library(PCAtools)
library(recluster)
```
Now load the the data from the previous lab (change path to the file if necessary):
```{r}
load("AB332_lab_I.RData")
```
# Distance metrics
Let's calculate the Bray Curtis dissimilarities for the rarefied dataset
```{r}
otu.tab.trans.ss.nozero.bray <- vegdist(otu.tab.trans.ss.nozero, method = "bray")
as.matrix(otu.tab.trans.ss.nozero.bray)[1:5, 1:5]
```
- *What kind of data is the Bray-Curtis dissimilarity suitable for?*
- *How can you change the dissimilarity index in the previous chunk of code?*
# Ordination and clustering
## PCA
Pca from the rarefied table
```{r}
otu.tab.trans.ss.nozero.pca <- PCAtools::pca(t(otu.tab.trans.ss.nozero), scale = FALSE) # Runs the PCA
biplot(otu.tab.trans.ss.nozero.pca, showLoadings = T, lab = rownames(otu.tab.trans.ss.nozero)) # Plots the PCA
screeplot(otu.tab.trans.ss.nozero.pca, axisLabSize = 18, titleLabSize = 22) # We plot the percentage of variance explained by each axis
```
## NMDS
We will define the function NMDS.scree() that automatically performs a NMDS for 1-7 dimensions and plots the number of dimensions vs. stress
```{r}
set.seed(666) # Set a seed to make results reproducible
NMDS.scree <- function(x) { # x is the name of the distance matrix
plot(rep(1, 7), replicate(7, metaMDS(x, autotransform = FALSE, k = 1)$stress), xlim = c(1, 7), ylim = c(0, 0.30), xlab = "# of Dimensions", ylab = "Stress", main = "NMDS stress plot")
for (i in 1:7) {
points(rep(i + 1, 7), replicate(7, metaMDS(x, autotransform = FALSE, k = i + 1)$stress))
}
}
```
Using the function to determine the optimal number of dimensions
Using the rarefied table
```{r,include=FALSE}
NMDS.scree(otu.tab.trans.ss.nozero.bray)
```
We calculate NMDS for k(dimensions)=2
Rarefied table (we use the dataframe to have access to sample and OTU names)
```{r}
otu.tab.trans.ss.nozero.bray.nmds <- metaMDS(otu.tab.trans.ss.nozero, k = 2, trymax = 100, trace = FALSE, autotransform = FALSE, distance = "bray")
otu.tab.trans.ss.nozero.bray.nmds
```
# Make stressplot
```{r}
stressplot(otu.tab.trans.ss.nozero.bray.nmds)
```
Simple plotting
Rarefied table
```{r}
plot(otu.tab.trans.ss.nozero.bray.nmds, display = "sites", type = "n")
points(otu.tab.trans.ss.nozero.bray.nmds, display = "sites", col = "red", pch = 19)
text(otu.tab.trans.ss.nozero.bray.nmds, display = "sites")
```
### Let's make nicer plots
We get the seasons for samples
```{r}
isa.metadata <- read_tsv("https://raw.githubusercontent.com/krabberod/UNIS_AB332_2022/main/computer_lab/data/AB332metadata_v3.txt")
isa.metadata <- column_to_rownames(isa.metadata, var = "Sample_Name")
isa.metadata.simp <- isa.metadata[6:30, ]
```
Rarefied table
We generate a table of nmds scores and other features
```{r}
otu.tab.trans.ss.nozero.bray.nmds.scores <- as.data.frame(scores(otu.tab.trans.ss.nozero.bray.nmds)$sites)
otu.tab.trans.ss.nozero.bray.nmds.scores$season <- isa.metadata.simp$seasons
otu.tab.trans.ss.nozero.bray.nmds.scores$month <- as.factor(isa.metadata.simp$month)
otu.tab.trans.ss.nozero.bray.nmds.scores$samples <- rownames(otu.tab.trans.ss.nozero.bray.nmds.scores)
```
Create the plot
```{r}
ggplot(otu.tab.trans.ss.nozero.bray.nmds.scores) +
geom_point(mapping = aes(x = NMDS1, y = NMDS2, colour = month, shape = season), size = 3) +
coord_fixed() + ## need aspect ratio of 1!
geom_text_repel(
box.padding = 0.5, aes(x = NMDS1, y = NMDS2, label = samples),
size = 3
)
```
# Clustering of samples
Allows determining the similarity between samples as well as the organization of samples in groups.
Hierarchical clustering: samples will be organized in ranks according to their similarity and all samples will be included in a large group Unweighted Pair-Group Method Using Arithmetic Averages (UPGMA): This linkage method will link samples by considering their distance to a subgroup arithmetic average. This is a method widely used in ecology
### UPGMA
Rarefied dataset
We generate 100 trees by re-sampling and then, we plot the consensus tree
```{r}
otu.tab.trans.ss.nozero.bray.upgma <- recluster.cons(otu.tab.trans.ss.nozero.bray, tr = 100, p = 0.5, method = "average")
plot(otu.tab.trans.ss.nozero.bray.upgma$cons) # plot consensus tree
```
We'll calculate bootstrap support values (0: bad - 100: perfect)
This allows us to know how well supported is the branching pattern
The boostrapping function may take a while to run (depending on your computer)
```{r}
otu.tab.trans.ss.nozero.bray.upgma.boot <- recluster.boot(otu.tab.trans.ss.nozero.bray.upgma$cons, otu.tab.trans.ss.nozero, tr = 100, p = 0.5, method = "average", boot = 1000, level = 1)
```
Use Ape to plot
```{r}
library(ape)
tree <- otu.tab.trans.ss.nozero.bray.upgma$cons
boot <- as.data.frame(otu.tab.trans.ss.nozero.bray.upgma.boot)$V1
plot.phylo(tree, type = "cladogram")
nodelabels(boot, frame = "none", cex = 1, col = "red")
```
```{r}
save.image("AB332_lab_II.RData")
```