12-eqcov.qmd

```{r include=FALSE}
source("R/common.R")
knitr::opts_chunk$set(fig.path = "figs/ch12/")
```

::: {.content-visible unless-format="pdf"}
{{< include latex/latex-commands.qmd >}}
:::

# Visualizing Equality of Covariance Matrices {#sec-eqcov}

> _To make the preliminary test on variances is rather like putting to sea in a rowing boat
> to find out whether conditions are sufficiently calm for an ocean liner to leave port._
> --- G. E. P. Box [-@Box:1953]


This chapter concerns the extension of tests of homogeneity of variance from the classical univariate
ANOVA setting to the analogous multivariate (MANOVA) setting. Such tests are a routine but important aspect of data analysis, as particular violations can drastically impact model estimates and appropriate conclusions that can be drawn [@Lix:1996]. 

Beyond issues of model assumptions, 
the question of equality of covariance matrices is often of general interest itself. For instance,
variability is often an important issue in studies of strict equivalence in laboratories comparing across  multiple patient measurements and in other applied contexts [see @Gastwirth-etal:2009 for other exemplars].
Moreover the outcome of such tests often have important consequences
for the details of a main method of analysis.  Just as the Welsh $t$-test [@Welch:1947] is now commonly
used and reported for a two-group test of differences in means under unequal variances,
a preliminary test of equality of covariance matrices is often used in discriminant analysis
to decide whether linear (LDA) or quadratic discriminant analysis (QDA) should be applied in a
given problem. In such cases, the data at hand should inform the choice of statistical analysis to utilize.


We provide some answers to the following questions:

- **Visualization**: How can we visualize differences among group variances and covariance matrices,
  perhaps in a way that is analogous to what is done to visualize differences among group means?
  As will be illustrated, differences among covariance matrices can be comprised of spread in overall size ("scatter") and shape ("orientation"). These can be seen in data space with data ellipses, particularly if the data is centered by shifting all groups to the grand mean,
  
- **Low-D views**:  When there are more than a few response variables, what low-dimensional
  views can show the most interesting properties related to the equality of covariance matrices?
  Projecting the data into the space of the principal components serves well again here. Surprisingly, we will see that the small dimensions contain useful information about differences among the group covariance matrices.
  
- **Other statistics**: Box's $M$-test is most widely used. Are there other worthwhile test statistics? We will see that graphics methods suggest alternatives.

The following subsections provide a capsule summary of the issues in this topic.
Most of the discussion is couched in terms of a one-way design for simplicity,
but the same ideas can apply to two-way (and higher) designs, where a "group" factor
is defined as the product combination (interaction) of two or more factor
variables. When there are also numeric covariates, this topic can also be extended
to the multivariate analysis of covariance (MANCOVA) setting. This can be 
accomplished by applying these techniques to the residuals from predictions by 
the covariates alone.

**Packages**

In this chapter we use the following packages. Load them now
```{r pkg-load}
library(car)
library(heplots)
library(candisc)
library(ggplot2)
library(dplyr)
library(tidyr)
```


## Homogeneity of Variance in Univariate ANOVA {#sec-homogeneity-ANOVA}

In classical (Gaussian) univariate ANOVA models, the main interest is typically on
tests of mean differences in a response $y$ according to one or more factors.
The validity of the typical $F$ test, however, relies on the assumption
of _homogeneity of variance_: 
all groups have the same (or similar) variance,
$$
\sigma_1^2 = \sigma_2^2 = \cdots = \sigma_g^2 \; .
$$

It turns out that the $F$ test for differences in means
is relatively robust to violation of this assumption [@Harwell:1992],
as long as the group sample sizes are roughly equal.[^1]
This applies to Type I error $\alpha$ rates, which are not much affected.
However, unequal variance makes the ANOVA tests less efficient: you lose power to detect
significant differences.

[^1]: If group sizes are greatly unequal __and__ homogeneity of variance is violated, then the $F$ statistic is too liberal ($p$ values too large) when large sample variances are associated with small group sizes.
Conversely, the $F$ statistic is too conservative if large variances are associated with large group sizes.

A variety of classical test statistics for homogeneity of variance are available, including
Hartley's $F_{max}$ [@Hartley:1950], Cochran's _C_ [@Cochran:1941],and Bartlett's test [@Bartlett:1937],
but these have been found to have terrible statistical properties [@Rogan:1977], which prompted Box's famous quote.

Levene [-@Levene:1960] introduced a different form of test, based on the simple idea that when 
variances are equal across groups, 
the average _absolute values_ of differences between the observations and group means
will also be equal, i.e., substituting an $L_1$ norm for the $L_2$ norm of variance. 
In a one-way design, this is equivalent to a test of group differences in the
means of the auxilliary variable $z_{ij} = | y_{ij} - \bar{y}_i |$.


More robust versions of this test were proposed by @BrownForsythe:1974. These tests
substitute the group mean by either the group **median** or a **trimmed mean** in the ANOVA of the absolute deviations.
Some suggest these should be almost always preferred to Levene's version using the mean deviation.
See @Conover-etal:1981 for an early review and @Gastwirth-etal:2009 for a general discussion of these tests. 
In what follows, we refer to this class of tests as "Levene-type" tests and suggest a 
multivariate extension described below (@sec-mlevene).

These deviations from a group central can be calculated using `heplots::colDevs()` and the central value can be
a function, like `mean`, `median` or an anonymous one like `function(x) mean(x, trim = 0.1))` that trims 10%
off each side of the distribution. With a response `Y` Levene's test then be performed "by hand" as follows:

```{r lm-levine, eval=FALSE}
Z.mean <- abs( colDevs(Y, group) )
lm(Z.mean ~ group)

Z.med <- abs( colDevs(Y, group, median) )
lm(Z.med ~ group)
```

The function `car::leveneTest()` does this, so we could examine whether the variances are equal in the Penguin variables, one at a time,
like so:
```{r levene-peng1}
data(peng, package = "heplots")
leveneTest(bill_length ~ species, data=peng)
  # ...
leveneTest(body_mass ~ species, data=peng)
```


More conveniently, `heplots:leveneTests()` with an "s", does this for each of a set of response variables,
specified in a data frame, a model formula or a `"mlm"` object.
It also formats the results in a more pleasing way:

```{r levene-peng2}
peng.mod <- lm(cbind(bill_length, bill_depth, flipper_length, body_mass) ~ species, 
               data = peng)
leveneTests(peng.mod)
```

So, this tells us that the groups do not differ in variances on first three variables, but they do for `body_mass`.

## Visualizing Levene's test {#sec-mlevene}

To gain some insight into the problem of homogeneity of variance it is helpful how the situation looks in terms of data.
For the Penguin data, it might be simplest just boxplots of the variables and try to see whether the 
**widths** of the
central 50% boxes seem to be the same, as in @fig-peng-boxplots.
However, it is perceptually difficult to focus on differences with widths of
the boxes within each panel when their centers also differ from group to group.

```{r}
#| label: fig-peng-boxplots
#| code-fold: true
#| code-summary: "See the code"
#| fig-width: 10
#| fig-height: 5
#| out-width: "90%"
#| fig-cap: "Boxplots for the Penguin variables. For assessing homogeneity of variance, we should be looking for differences in width of
#| the central 50% boxes in each panel, rather than difference in central tendency."
source("R/penguin/penguin-colors.R")
col <- peng.colors("dark")
clr <- c(col, gray(.20))
peng_long <- peng |> 
  pivot_longer(bill_length:body_mass, 
               names_to = "variable", 
               values_to = "value") 

peng_long |>
  group_by(species) |> 
  ggplot(aes(value, species, fill = species)) +
  geom_boxplot() +
  facet_wrap(~ variable, scales = 'free_x') +
  theme_penguins() +
  theme_bw(base_size = 14) +
  theme(legend.position = 'none') 
```

Instead, you can see more _directly_ what is tested by the Levene test by graphing the absolute deviations from the group means or medians. This is another example of the graphic idea that to make visual comparisons easier
by plotting quantities of direct interest.
You can calculate these values as follows:

```{r}
vars <- c("bill_length", "bill_depth", "flipper_length", "body_mass")
pengDevs <- colDevs(peng[, vars], peng$species, median) |>
  abs()
```

From a boxplot of the absolute deviations in @fig-peng-devplots your eye can now focus on the central
value, shown by the median '|' line, because Levene's method is testing whether these differ
across groups.

```{r}
#| label: fig-peng-devplots
#| code-fold: true
#| code-summary: "See the code"
#| fig-width: 10
#| fig-height: 5
#| out-width: "90%"
#| fig-cap: "Boxplots for absolute differences from group medians for the Penguin data." 
# calculate absolute differences from median
dev_long <- data.frame(species = peng$species, pengDevs) |> 
  pivot_longer(bill_length:body_mass, 
               names_to = "variable", 
               values_to = "value") 

dev_long |>
  group_by(species) |> 
  ggplot(aes(value, species, fill = species)) +
  geom_boxplot() +
  facet_wrap(~ variable, scales = 'free_x') +
  xlab("absolute median deviation") +
  theme_penguins() +
  theme_bw(base_size = 14) +
  theme(legend.position = 'none') 
```

It is now easy to see that the medians largely align for all the variables except for `body_mass`.

## Homogeneity of variance in MANOVA {#sec-homogeneity-MANOVA}

In the MANOVA context, the main emphasis, of course, is on differences among mean vectors,
testing
$$
\mathcal{H}_0 : \mathbf{\mu}_1 = \mathbf{\mu}_2 = \cdots = \mathbf{\mu}_g \; .
$$
However, the standard test statistics (Wilks' Lambda, Hotelling-Lawley trace, Pillai-Bartlett trace, Roy's maximum root) rely upon
the analogous assumption that
the within-group covariance **matrices** $\mathbf{\Sigma}_i$ are equal for all groups,
$$
\mathbf{\Sigma}_1 = \mathbf{\Sigma}_2 = \cdots = \mathbf{\Sigma}_g \; .
$$
This is much stronger than in the univariate case, because it also requires that all the correlations between pairs of variables are the same for all groups. For example, for two responses, there are three
parameters ($\rho, \sigma_1^2, \sigma_2^2$) assumed equal across all groups; for $p$ responses, there
are $p (p+1) / 2$ assumed equal.

<!--
For example, in the case of two responses,
we must assume:

$$
\begin{pmatrix} 
\sigma_1^2             & \textsf{sym} \\ 
\rho \sigma_1 \sigma_2 & \sigma_2^2   \\ 
\end{pmatrix}_1 = 
\begin{pmatrix} 
\sigma_1^2             & \textsf{sym} \\ 
\rho \sigma_1 \sigma_2 & \sigma_2^2   \\ 
\end{pmatrix}_2 = \dots =
\begin{pmatrix} 
\sigma_1^2             & \textsf{sym} \\ 
\rho \sigma_1 \sigma_2 & \sigma_2^2   \\ 
\end{pmatrix}
_g
$$
-->


<!-- **Insert** pairs covEllipses for penguins data -->

To preview the main example, @fig-peng-covEllipse0 shows data ellipses for the
main size variables in the `palmerpenguins::penguins` data.

To view the relations ...

```{r}
#| label: fig-peng-covEllipse0
#| fig-align: center
#| fig-height: 5
#| fig-width: 10
#| out-width: "100%"
#| code-fold: true
#| code-summary: "See the code"
#| fig-cap: "Data ellipses for bill length and bill depth in the penguins data, also showing the pooled covariance. Left: As is; right: centered at the grand means for easier comparison."
op <- par(mar = c(4, 4, 1, 1) + .5,
          mfrow = c(c(1,2)))
covEllipses(cbind(bill_length, bill_depth) ~ species, data=peng,
  fill = TRUE,
  fill.alpha = 0.1,
  lwd = 3,
  col = clr)

covEllipses(cbind(bill_length, bill_depth) ~ species, data=peng,
  center = TRUE, 
  fill = c(rep(FALSE,3), TRUE), 
  fill.alpha = .1, 
  lwd = 3,
  col = clr,
  label.pos = c(1:3,0))
par(op)
```

All pairs:
```{r}
#| label: fig-peng-covEllipse-pairs
#| fig-align: center
#| out-width: "100%"
#| code-fold: show
#| fig-cap: "All pairwise covariance ellipses for the penguins data."
clr <- c(peng.colors(), "black")
covEllipses(peng[,3:6], peng$species, 
  variables=1:4,
  col = clr,
  fill=TRUE, 
  fill.alpha=.1)
```

They covariance ellipses look pretty similar in size, shape and orientation.
But what does Box's M test (described below) say? As you can see,
it concludes strongly against the null hypothesis.

```{r boxm}
boxM(cbind(bill_length, bill_depth, flipper_length, body_mass) ~ species, data=peng)
```


## Assessing heterogeneity of covariance matrices: Box's M test {#sec-boxM}

@Box:1949 proposed the following likelihood-ratio test (LRT) statistic for testing the
hypothesis of equal covariance matrices,
$$
M = (N -g) \ln \;|\; \mathbf{S}_p \;|\; - \sum_{i=1}^g (n_i -1) \ln \;|\; \mathbf{S}_i \;|\; \; ,
$$ {eq-boxm}

where $N = \sum n_i$ is the total sample size and
$\mathbf{S}_p = (N-g)^{-1} \sum_{i=1}^g (n_i - 1) \mathbf{S}_i$ 
is the pooled covariance matrix.
$M$ can thus be thought of as a ratio of the determinant of the pooled $\mathbf{S}_p$
to the geometric mean of the determinants of the separate $\mathbf{S}_i$.

In practice, there are various transformations of the value of $M$ to yield a
test statistic with an approximately known distribution [@Timm:75].
Roughly speaking, when each $n_i > 20$, a $\chi^2$ approximation is often used;
otherwise an $F$ approximation is known to be more accurate.

Asymptotically, $-2 \ln (M)$ has a $\chi^2$ distribution. The $\chi^2$ approximation
due to Box [-@Box:1949; -@Box:1950] is that
$$
X^2 = -2 (1-c_1) \ln (M) \quad \sim \quad \chi^2_{df}
$$
with $df = (g-1) p (p+1)/2$ degrees of freedom, and a bias correction constant:
$$
c_1 = \left( 
\sum_i \frac{1}{n_i -1}
- \frac{1}{N-g}
\right)
\frac{2p^2 +3p -1}{6 (p+1)(g-1)} \; .
$$

In this form, Bartlett's test for equality of variances in the univariate case
is the special case of Box's M when there is only one response variable,
so Bartlett's test is sometimes used as univariate follow-up to determine
which response variables show heterogeneity of variance.

Yet, like its univariate counterpart, Box's test is well-known to be highly sensitive to violation
of (multivariate) normality and the presence of outliers. For example, @TikuBalakrishnan:1984
concluded from simulation studies that the normal-theory LRT provides poor control of Type I
error under even modest departures from normality. @OBrien:1992 proposed some robust alternatives,
and showed that Box's normal theory approximation suffered both in controlling the null size
of the test and in power. @ZhangBoos:1992:BCV also carried out simulation studies with similar
conclusions and used bootstrap methods to obtain corrected critical values.

## Visualizing heterogeneity

The goal of this chapter is to use the above background as a platform for discussing approaches to visualizing and testing the heterogeneity of covariance matrices in multivariate designs.  While researchers often rely on a single number to determine if their data have met a particular threshold, such compression will often obscure interesting information, particularly when a test concludes that differences exist, and one is left to wonder ``why?''.
It is within this context where, again, visualizations often reign supreme. 
In fact, we find it somewhat surprising that this issue has not been addressed before graphically in any systematic way. **TODO: cut this down**

In what follows, we propose three visualization-based approaches to questions of heterogeneity of covariance in MANOVA
designs:

(a) direct visualization of the information in the $\mathbf{S}_i$ and $\mathbf{S}_p$ using _data ellipsoids_ to show size and shape as minimal schematic summaries;

(b) a simple dotplot of the components of Box's M test: the log determinants of the 
$\mathbf{S}_i$ together with that of the pooled $\mathbf{S}_p$. Extensions of these simple plots raise the question of whether measures of heterogeneity other than that captured in Box's test might also be useful; and,

(c) the connection between Levene-type tests and an ANOVA (of centered absolute differences) suggests a parallel with a multivariate extension of Levene-type tests and a MANOVA.  We explore this with a version of
Hypothesis-Error (HE) plots we have found useful for visualizing mean differences in MANOVA designs.

```{r write-pkgs}
#| echo: false
cat("Writing packages to ", .pkg_file, "\n")
write_pkgs(file = .pkg_file)
```

<!-- ## References -->