Skip to content
joey711 edited this page Dec 13, 2012 · 17 revisions

distance()

see also...

The operation of this function works closely with the

and

functions. See their wiki-pages for further details and examples.

Usage

distance(physeq, method="unifrac", type="samples", ...)

Overview

The distance function takes a data object (phyloseq-class) and method option (character string), and returns a distance object (dist-class) suitable for certain ordination methods and other distance-based analyses. There are currently 44 explicitly supported method options, as well as user-provided arbitrary methods via an interface to designdist. For the complete list of currently supported options/arguments to the method parameter, type distance("list") at the command-line. Only sample-wise distances are currently supported (the type argument), but eventually species-wise (OTU-wise) distances will be supported as well.

Example: "Enterotypes" dataset using many different methods

Because the distance() function organizes distance calculations into one function, it is relatively straightforward to calculate all supported distance methods and investigate the results. The following code will perform such a loop on the "Enterotypes" dataset, perform multi-dimensional scaling (a.k.a. principle coordinates analysis), and plot the first two axes, shading and shaping the points in each plot according to sequencing technology and assigned "Enterotype" label.

Note that we have omitted the options that require a phylogenetic tree because the "enterotype" example dataset currently included in the phyloseq-package does not have one.

This will take a little while to run, and save all 40+ plots as PNG files...

# Load necessary packages
library("phyloseq"); library("ggplot2")
# Load the enterotype data
data(enterotype)
# Remove the "taxa" that included all unassigned sequences ("-1")
enterotype <- subset_species(enterotype, Genus != "-1")
# The available distance methods coded in distance()
dist_methods <- distance("list")
# Remove the two distance-methods that require a tree
dist_methods <- dist_methods[-c(1, 2)]
# Define the destination directory for the plots
ptex_dir <- "~/Dropbox/R/distance_github_examples/"
width  <- 7
height <- 7
# Loop through each distance method
entl <- list()
for( i in dist_methods ){
	# Calculate distance matrix
	iDist <- distance(enterotype, method=i)
	# Calculate ordination
	iMDS  <- ordinate(enterotype, "MDS", distance=iDist)
	# Make plot
	p <- NULL # Don't carry over previous plot (if error, p will be blank)
	p <- plot_ordination(enterotype, iMDS, color="SeqTech", shape="Enterotype")
	p <- p + opts(title = paste("MDS using distance method ", i, sep=""))
	p <- p + opts(panel.background = theme_rect(colour="gray76", fill="gray76")) 
	# Save the graphic to file.
	ggsave(
		filename = paste(ptex_dir, "dist_example_", i, ".png", sep=""),
		plot = p, width=width, height=height
	)
}

The following are some selected examples among the created plots.

"Jensen-Shannon Divergence" "Jaccard" "Bray-Curtis" "Gower" "w"

Clone this wiki locally