Skip to content

Replace section 6

Paul edited this page May 6, 2017 · 6 revisions

This replaces section 6 of the Lotus tutorial:

http://psbweb05.psb.ugent.be/lotus/tutorial_R.html

What previously happend:

Dowload and extract this data and start R.

M   <- read.delim(file="higherLvl/Genus.txt",row.names=1)
M   <- as.matrix(M)
M1  <- sweep(M,2,colSums(M),"/")

Data normalization II

A different way to normalize the data is to sample exactly equal amounts of 16S rDNA for each sample in this experiment. Of course in practice this is impossible to do, but we can simulate this, by randomly selecting a subset of 16S rDNA. This is called rarefaction. Rarefy your original abundance matrix (M) into M2, using 2000 reads per sample:

if (!require("rtk")) install.packages("rtk"); 
library(rtk)
rarefied <-  rtk((M), depth = 2000, ReturnMatrix = 1)
M2       <- rarefied$raremat[[1]]
colSums(M2)

Use colSums(M2) to check if the rarefaction worked. The main use of rarefaction is in calculating diversity and richness correctly, for this we will look in the following at observed richness. The concept of observed richness within a sample is pretty simple (but useful): richness describes the number of different species that occur at least once in a sample. This can be calculated in two steps:

OnceOrMoreOftenPresent = M1>0

The sum of each column in this matrix will tell us how many species were detected in total within the respective sample:

rich1  <- apply(OnceOrMoreOftenPresent,2,sum)

Compare the richness values in “rich1” to the richness obtained on the rarefied matrix M2, calculated with a shortened command:

rich2  <- apply(M2>0,2,sum)

Compare rich1 and rich2 in a matrix value by value. This can be easily done using the “cbind” command to bind two vectors column-wise together, so we get a matrix with 2 columns. What does the second part of the command do? What happens if you change that to order(rich2)?

cbind(rich1,rich2)[order(rich1),]

Discuss which richness values have the highest value to the researcher and why the order is very different between these two richness estimates. Is one way clearly wrong?

Why did we choose 2000 as cutoff for the sequences per sample? What is the maximum value we could choose?

Collectors curve

To create a collectors curve of the dataset is as easy as calling:

collectors.curve(M1, times = 100, bin = 1, ylab = "Diversity", xlab = "Samples", col = 3, col2 = 1)

This creates a colectors curve, sampeling a hundret times without bining samples and coloring the border black (1) while keeping the background green (3). Changing the object from M1 to the rarefied dataset M2 allows the visualisation of the rarefaction result.

Plotting richness and diversity

To visualize richness and diversity of the samples is as easy as calling plot(rarefied) on the dataset rarefied by RTK. Without grouping the samples into clusters, this makes very little sense. Grouping the samples could be done using the column names like this:

groups <- sapply((colnames(M2)), substr,start = 1, stop = 2)
plot(rarefied, groups = groups, div = 'richness')

Which shows equal diversity across the groups. To plot a rarefaction curve is equally easy, but requires fresh rarefaction to several depths first:

rarefiedCurve <-  rtk((M), depth = seq(10,2000, length.out = 50), ReturnMatrix = 1)
groups <- sapply((colnames(M2)), substr,start = 1, stop = 2)
plot(rarefiedCurve, groups = groups, div = 'richness', raw = T, pch = c(1,5,16), xlab = 'Depth', ylab = 'Richness', fit = F, lty = 0)
Clone this wiki locally