-
Notifications
You must be signed in to change notification settings - Fork 187
ordinate
The operation of this function also depends a lot on the
function, and plotting is facilitated a lot by
function. See their pages for further details and examples.
Load the necessary packages and data.
library("phyloseq"); library("ggplot2")
data(GlobalPatterns)
We will use the same data-trimming as in the plot_ordination examples. For trying this code yourself, see the trimming code-chunk at the beginning of the example-set.
- (Unconstrained) Correspondence Analysis (CA)
Let's start with an unconstrained correspondence analysis on the abundance table of OTUs in each sample.
GP.ca <- ordinate(GP1, "CCA")
p1 <- plot_ordination(GP1, GP.ca, type="split", color="Phylum", shape="human", label="SampleType")
Before we make the graphic, let's first define a color scheme and background to use in all split plots, using some of the output from the plot_ordination
function to define parameters that will be consistent in later split plots:
my.cols <- rainbow(length(levels(p1$data$Phylum)))
names(my.cols) <- levels(p1$data$Phylum)
my.cols["samples"] <- "black"
my.background <- "opts(panel.background = theme_rect(colour=\\"gray76\\", fill=\\"gray76\\"))"
Now make the plot, adding the new theme parameters
p1 + scale_colour_manual(values=my.cols) + eval(parse(text=my.background))
- Detrended Correspondence Analysis (DCA)
GP.dca <- ordinate(GP1, "DCA")
p2 <- plot_ordination(GP1, GP.dca, type="split", color="Phylum", shape="human", label="SampleType")
p2 + scale_colour_manual(values=my.cols) + eval(parse(text=my.background))
- Constrained Correspondence Analysis (CCA).
Let's use Constrained (aka "Canonical") Correspondence Analysis to contrain the first axis on the known "SampleType" groups.
GP.cca <- ordinate(GP1 ~ SampleType, "CCA")
p3 <- plot_ordination(GP1, GP.cca, type="split", color="Phylum", shape="human", label="SampleType")
p3 + scale_colour_manual(values=my.cols) + eval(parse(text=my.background))
- Double Principle Coordinate Analysis (DPCoA)
DPCoA in this context uses the patristic-distances defined in the phylogenetic tree as the additional distance matrix (between the OTUs), which means that a phylogenetic tree is required. A tree is included in our GP1
object, and was trimmed in parallel with the other trimming we did above, so it can be used directly.
GP.dpcoa <- ordinate(GP1, "dpcoa")
p4 <- plot_ordination(GP1, GP.dpcoa, type="split", color="Phylum", shape="human", label="SampleType")
p4 + scale_colour_manual(values=my.cols) + eval(parse(text=my.background))
- Redundancy Analysis (RDA)
GP.rda <- ordinate(GP1, "RDA")
p5 <- plot_ordination(GP1, GP.rda, type="split", color="Phylum", shape="human", label="SampleType")
p5 + scale_colour_manual(values=my.cols) + eval(parse(text=my.background))
Notice that the axis labels for this unconstrained-RDA are "PC1" and "PC2", indicating that these are "principle components". These axis labels would be "RDA1" and so on if we constrained by the SampleType as we did in the CCA example above (for instance). In principle you can constrain the RDA or CCA by any variable in the sampleData
table. The choice of variable is important, and you may want to consult other methods and the literature for determining how exactly you want to make that choice.
- Non-Metric Multidimensional Scaling (NMDS)
The default distance-method used by ordinate
is the unweighted-UniFrac distance (which, of course, also requires that your data include a tree. Try tre(GP1)
to see). However, the NMDS option, which uses vegan::metaMDS
, will also calculate species coordinates and allow a biplot representation if you specific one of the distances listed in vegan::vegdist
. In this case, we want to try and stick with the biplot representation as in the previous plots, so we will use the Bray-Curtis distance ("bray"
).
GP.nmds <- ordinate(GP1, "NMDS", distance="bray")
p6 <- plot_ordination(GP1, GP.nmds, type="split", color="Phylum", shape="human", label="SampleType")
p6 + scale_colour_manual(values=my.cols) + eval(parse(text=my.background))
- (classical) Multidimensional Scaling (MDS) / Principle Coordinate Analysis (PCoA)
Let's look at an example using weighted-UniFrac and PCoA, like you might often find in recent high-throughput phylogenetic-sequencing articles (including the Global Patterns article). This will not be a bi-plot. We will instead shade by SampleType.
Note that because weighted-UniFrac can take a long time on 4000+ OTUs, even with the "Fast UniFrac" implementation in phyloseq
, I will further specify a parallel = TRUE
option, and precede my ordinate
call with a parallel backend call and 7 cores.
library("doParallel")
registerDoParallel(cores=7)
GP.wufmds <- ordinate(GP1, "PCoA", distance="unifrac", weighted=TRUE, parallel=TRUE)
Note: alternatively, if you want to just get the weighted-UniFrac distance matrix...
```R
GP.wuf.dist <- distance(GP1, weighted=TRUE, parallel=TRUE)
or
GP.wuf.dist <- UniFrac(GP1, weighted=TRUE, parallel=TRUE)
Now make the plot so we can compare...
p7 <- plot_ordination(GP1, GP.wufmds, color="SampleType", shape="human", label="SampleType")
p7 + geom_point(size=5) + geom_line() + eval(parse(text=my.background))