Skip to content

poppr version 2.7.0

Compare
Choose a tag to compare
@zkamvar zkamvar released this 16 Mar 17:33
2e22b55

Changes in poppr version 2.7

Poppr version 2.7 introduces a change to how AMOVA is calculated (thanks to Patrick Meirmans for the impetus and sample data) and two new functions for data conversion:

  • make_haplotypes() for splitting data into pseudo-haplotypes
  • as.genambig() for converting genind/genclone objects to polysat's
    genambig class.

The changes will be outlined here.

Calculating (\rho) --- AMOVA from allele frequencies

Rho is a method of calculating population differentiation in the AMOVA
framework without considering within-individual variance and is analogous
to Fst for use with autotetraploid organisms (Ronfort et al 1998;
Meirmans and Liu 2018). The process uses the Euclidean distance of allele
frequencies and can be performed by setting within = FALSE.

library("poppr")
data("Pinf")
Pinf
#> 
#> This is a genclone object
#> -------------------------
#> Genotype information:
#> 
#>    72 multilocus genotypes 
#>    86 tetraploid individuals
#>    11 codominant loci
#> 
#> Population information:
#> 
#>     2 strata - Continent, Country
#>     2 populations defined - South America, North America

# be sure to recode your polyploid data so that there are no zeroes for placeholders
(prc <- recode_polyploids(Pinf, newploidy = TRUE))
#> 
#> This is a genclone object
#> -------------------------
#> Genotype information:
#> 
#>    72 multilocus genotypes 
#>    86 diploid (55) and triploid (31) individuals
#>    11 codominant loci
#> 
#> Population information:
#> 
#>     2 strata - Continent, Country
#>     2 populations defined - South America, North America

# calculate rho
rho  <- poppr.amova(prc, ~Continent/Country, within = FALSE, cutoff = .1)
rho$statphi
#>                              Phi
#> Phi-samples-total     0.12713922
#> Phi-samples-Continent 0.05269217
#> Phi-Continent-total   0.07858802

Here, the value of (\rho) is 0.1271392.

Changes in AMOVA for poppr 2.7 can affect your results

The process of calculating AMOVA in poppr involved four steps:

  1. If the data were diploid, genotypes were split into pseudo-haplotypes
  2. A distance matrix was calculated using diss.dist() and the square root was taken
  3. The matrix and hierarchy were prepared for either ade4 or pegas
  4. AMOVA was calculated using either ade4 or pegas

In this new version of poppr, you now have access to the function that splits
haplotypes called make_haplotypes().

The major change in poppr 2.7 is that the dist() has replaced diss.dist()

Changing diss.dist() to dist()

The default distance calculation for all AMOVA was diss.dist(), which is a
dissimilarity distance. For haploid or pseudo-haploid data, this is
equivalent to a squared Euclidean distance, and was appropriate for
calculating the distance for use when the within = TRUE option was set
(which was default). This method, however, was not appropriate when not
considering within-individual variation.

For example, this is how the previous versions of poppr would have calculated
(\rho):

dissim <- diss.dist(prc)
old  <- poppr.amova(prc, ~Continent/Country, within = FALSE, cutoff = .1, 
                    dist = dissim, squared = TRUE)
#> 
#>  No loci with missing values above 10% found.
#> Distance matrix is non-euclidean.
#> Using quasieuclid correction method. See ?quasieuclid for details.
old$statphi
#>                              Phi
#> Phi-samples-total     0.15032208
#> Phi-samples-Continent 0.12439575
#> Phi-Continent-total   0.02960965

If we compare this result to the one above, we can see that there is a
distinct difference in the values of (\rho).

AMOVA with missing data

The dist() function handles missing data differently than diss.dist(), so
you may see small differences in your results (for details, see this
StackOverflow answer: https://stackoverflow.com/a/18117751/2752888).

For example, the nancycats data set has an average of 2.3% missing data. This
results in a small shift in the (\Phi) statistics. Here are the results with
version 2.7:

data(nancycats)
strata(nancycats) <- data.frame(colony = pop(nancycats))
new <- poppr.amova(nancycats, ~colony, cutoff = .1)
#> 
#>  No loci with missing values above 10% found.
#> Distance matrix is non-euclidean.
#> Using quasieuclid correction method. See ?quasieuclid for details.
new$statphi
#>                          Phi
#> Phi-samples-total  0.1971382
#> Phi-samples-colony 0.1235778
#> Phi-colony-total   0.0839327

To show the results from previous versions, we need to use the new
make_haplotypes() function to create pseudo-haplotypes:

nanhaps <- make_haplotypes(nancycats)

# confirm that the number of individuals is double that of the original data
nInd(nanhaps)
#> [1] 474
2 * nInd(nancycats)
#> [1] 474

# calculate squared Euclidean distance
d2n <- diss.dist(nanhaps)

# calculate AMOVA
old <- poppr.amova(nanhaps, ~colony/Individual, cutoff = .1, 
                   dist = d2n, squared = TRUE)
#> 
#>  No loci with missing values above 10% found.
#> Distance matrix is non-euclidean.
#> Using quasieuclid correction method. See ?quasieuclid for details.
old$statphi
#>                           Phi
#> Phi-samples-total  0.19292024
#> Phi-samples-colony 0.12109840
#> Phi-colony-total   0.08171772

The different treatment of the missing data has created a difference of
0.004218 in (\Phi_{ST}).

Converting genind/genclone to polysat

Polysat is a package that works with polyploid microsatellite data. You can
install it from CRAN with install.packages("polysat"). The poppr function
as.genambig() will convert from genind to genambig:

library("polysat") # load polysat
Pinf
#> 
#> This is a genclone object
#> -------------------------
#> Genotype information:
#> 
#>    72 multilocus genotypes 
#>    86 tetraploid individuals
#>    11 codominant loci
#> 
#> Population information:
#> 
#>     2 strata - Continent, Country
#>     2 populations defined - South America, North America
Pinf.ga <- as.genambig(Pinf) # Convert to genambig
summary(Pinf.ga)             # Show the summary of the contents
#> Dataset with allele copy number ambiguity.
#> Insert dataset description here.
#> Number of missing genotypes: 10
#> 86 samples, 11 loci.
#> 2 populations.
#> Ploidies: 2 3 NA
#> Length(s) of microsatellite repeats: NA

Once you have your genambig object, you can use all the functions polysat has
available.

Created on 2018-03-16 by the reprex package (v0.2.0).

References

Ronfort, Joëlle, Eric Jenczewski, Thomas Bataillon, and François Rousset. "Analysis of population structure in autotetraploid species." Genetics 150, no. 2 (1998): 921-930.

Patrick G. Meirmans and Shenglin Liu. "Analysis of Molecular Variance (AMOVA) for autopolyploids" Submitted (2018)