poppr version 2.7.0
Changes in poppr version 2.7
Poppr version 2.7 introduces a change to how AMOVA is calculated (thanks to Patrick Meirmans for the impetus and sample data) and two new functions for data conversion:
make_haplotypes()
for splitting data into pseudo-haplotypesas.genambig()
for converting genind/genclone objects to polysat's
genambig class.
The changes will be outlined here.
Calculating (\rho) --- AMOVA from allele frequencies
Rho is a method of calculating population differentiation in the AMOVA
framework without considering within-individual variance and is analogous
to Fst for use with autotetraploid organisms (Ronfort et al 1998;
Meirmans and Liu 2018). The process uses the Euclidean distance of allele
frequencies and can be performed by setting within = FALSE
.
library("poppr")
data("Pinf")
Pinf
#>
#> This is a genclone object
#> -------------------------
#> Genotype information:
#>
#> 72 multilocus genotypes
#> 86 tetraploid individuals
#> 11 codominant loci
#>
#> Population information:
#>
#> 2 strata - Continent, Country
#> 2 populations defined - South America, North America
# be sure to recode your polyploid data so that there are no zeroes for placeholders
(prc <- recode_polyploids(Pinf, newploidy = TRUE))
#>
#> This is a genclone object
#> -------------------------
#> Genotype information:
#>
#> 72 multilocus genotypes
#> 86 diploid (55) and triploid (31) individuals
#> 11 codominant loci
#>
#> Population information:
#>
#> 2 strata - Continent, Country
#> 2 populations defined - South America, North America
# calculate rho
rho <- poppr.amova(prc, ~Continent/Country, within = FALSE, cutoff = .1)
rho$statphi
#> Phi
#> Phi-samples-total 0.12713922
#> Phi-samples-Continent 0.05269217
#> Phi-Continent-total 0.07858802
Here, the value of (\rho) is 0.1271392.
Changes in AMOVA for poppr 2.7 can affect your results
The process of calculating AMOVA in poppr involved four steps:
- If the data were diploid, genotypes were split into pseudo-haplotypes
- A distance matrix was calculated using
diss.dist()
and the square root was taken - The matrix and hierarchy were prepared for either ade4 or pegas
- AMOVA was calculated using either ade4 or pegas
In this new version of poppr, you now have access to the function that splits
haplotypes called make_haplotypes()
.
The major change in poppr 2.7 is that the dist()
has replaced diss.dist()
Changing diss.dist()
to dist()
The default distance calculation for all AMOVA was diss.dist()
, which is a
dissimilarity distance. For haploid or pseudo-haploid data, this is
equivalent to a squared Euclidean distance, and was appropriate for
calculating the distance for use when the within = TRUE
option was set
(which was default). This method, however, was not appropriate when not
considering within-individual variation.
For example, this is how the previous versions of poppr would have calculated
(\rho):
dissim <- diss.dist(prc)
old <- poppr.amova(prc, ~Continent/Country, within = FALSE, cutoff = .1,
dist = dissim, squared = TRUE)
#>
#> No loci with missing values above 10% found.
#> Distance matrix is non-euclidean.
#> Using quasieuclid correction method. See ?quasieuclid for details.
old$statphi
#> Phi
#> Phi-samples-total 0.15032208
#> Phi-samples-Continent 0.12439575
#> Phi-Continent-total 0.02960965
If we compare this result to the one above, we can see that there is a
distinct difference in the values of (\rho).
AMOVA with missing data
The dist()
function handles missing data differently than diss.dist()
, so
you may see small differences in your results (for details, see this
StackOverflow answer: https://stackoverflow.com/a/18117751/2752888).
For example, the nancycats data set has an average of 2.3% missing data. This
results in a small shift in the (\Phi) statistics. Here are the results with
version 2.7:
data(nancycats)
strata(nancycats) <- data.frame(colony = pop(nancycats))
new <- poppr.amova(nancycats, ~colony, cutoff = .1)
#>
#> No loci with missing values above 10% found.
#> Distance matrix is non-euclidean.
#> Using quasieuclid correction method. See ?quasieuclid for details.
new$statphi
#> Phi
#> Phi-samples-total 0.1971382
#> Phi-samples-colony 0.1235778
#> Phi-colony-total 0.0839327
To show the results from previous versions, we need to use the new
make_haplotypes()
function to create pseudo-haplotypes:
nanhaps <- make_haplotypes(nancycats)
# confirm that the number of individuals is double that of the original data
nInd(nanhaps)
#> [1] 474
2 * nInd(nancycats)
#> [1] 474
# calculate squared Euclidean distance
d2n <- diss.dist(nanhaps)
# calculate AMOVA
old <- poppr.amova(nanhaps, ~colony/Individual, cutoff = .1,
dist = d2n, squared = TRUE)
#>
#> No loci with missing values above 10% found.
#> Distance matrix is non-euclidean.
#> Using quasieuclid correction method. See ?quasieuclid for details.
old$statphi
#> Phi
#> Phi-samples-total 0.19292024
#> Phi-samples-colony 0.12109840
#> Phi-colony-total 0.08171772
The different treatment of the missing data has created a difference of
0.004218 in (\Phi_{ST}).
Converting genind/genclone to polysat
Polysat is a package that works with polyploid microsatellite data. You can
install it from CRAN with install.packages("polysat")
. The poppr function
as.genambig()
will convert from genind to genambig:
library("polysat") # load polysat
Pinf
#>
#> This is a genclone object
#> -------------------------
#> Genotype information:
#>
#> 72 multilocus genotypes
#> 86 tetraploid individuals
#> 11 codominant loci
#>
#> Population information:
#>
#> 2 strata - Continent, Country
#> 2 populations defined - South America, North America
Pinf.ga <- as.genambig(Pinf) # Convert to genambig
summary(Pinf.ga) # Show the summary of the contents
#> Dataset with allele copy number ambiguity.
#> Insert dataset description here.
#> Number of missing genotypes: 10
#> 86 samples, 11 loci.
#> 2 populations.
#> Ploidies: 2 3 NA
#> Length(s) of microsatellite repeats: NA
Once you have your genambig object, you can use all the functions polysat has
available.
Created on 2018-03-16 by the reprex package (v0.2.0).
References
Ronfort, Joëlle, Eric Jenczewski, Thomas Bataillon, and François Rousset. "Analysis of population structure in autotetraploid species." Genetics 150, no. 2 (1998): 921-930.
Patrick G. Meirmans and Shenglin Liu. "Analysis of Molecular Variance (AMOVA) for autopolyploids" Submitted (2018)