Skip to content

Commit

Permalink
work on ch11
Browse files Browse the repository at this point in the history
  • Loading branch information
friendly committed Dec 19, 2024
1 parent b1e9e25 commit ad3544d
Show file tree
Hide file tree
Showing 37 changed files with 551 additions and 7,863 deletions.
2 changes: 1 addition & 1 deletion 03-multivariate_plots.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -580,7 +580,7 @@ as the Choleski factor of $\mathbf{S}$. Slightly abusing notation and taking the
we can write the data ellipsoid as simply:
$$
\mathcal{E}_c (\bar{\mathbf{y}}, \mathbf{S}) = \bar{\mathbf{y}} \; \oplus \; \sqrt{\mathbf{S}} \period
\mathcal{E}_c (\bar{\mathbf{y}}, \mathbf{S}) = \bar{\mathbf{y}} \; \oplus \; c\, \sqrt{\mathbf{S}} \period
$$ {#eq-ellE}
When $\mathbf{y}$ is (at least approximately) bivariate normal,
Expand Down
50 changes: 42 additions & 8 deletions 11-mlm-viz.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -104,9 +104,9 @@ shadow of the $\mat{E}$ ellipsoid on any axis (see @fig-galton-ellipse-r).
The $\mat{E}$ ellipsoid is then translated to the overall (grand) means $\bar{\mathbf{y}}$ of the variables plotted, which allows us to show the means for factor levels on the same scale, facilitating interpretation.
In the notation of @eq-ellE, the error ellipsoid is given by
$$
\mathcal{E}_c (\bar{\mathbf{y}}, \mathbf{E}) = \bar{\mathbf{y}} \oplus \mathbf{E}^{1/2} \comma
\mathcal{E}_c (\bar{\mathbf{y}}, \mathbf{E}) = \bar{\mathbf{y}} \; \oplus \; c\,\mathbf{E}^{1/2} \comma
$$
where, for 2D plots $c = \sqrt{2 F_{2, n-2}^{0.68}}$.
where $c = \sqrt{2 F_{2, n-2}^{0.68}}$ for 2D plots and $c = \sqrt{3 F_{3, n-3}^{0.68}}$ for 3D.

An ellipsoid representing variation in the means of a factor (or any other term reflected in a general linear hypothesis test, @eq-hmat) in the $\mat{H}$ matrix is simply the data ellipse of the fitted values for that term.
Dividing the hypothesis matrix by the error degrees of freedom, giving
Expand All @@ -116,16 +116,50 @@ puts this on the same scale as the \E ellipse.
I refer to this as _effect size scaling_, because it is similar to an effect size index used in
univariate models, e.g., $ES = (\bar{y}_1 - \bar{y}_2) / s_e$ in a two-group, univariate design.

The geometry of ellipsoids and multivariate tests allow us to go further with a re-scaling of the $\mat{H}$ ellipsoid
that gives a \emph{visual test of significance} for any term in a MLM, simply by dividing $\mat{H} / \text{df}_e$ further
by the $\alpha$-critical value of the corresponding test statistic.
This is illustrated in ...

```{r}
op <- par(mar = c(4, 4, 1, 1) + .5,
mfrow = c(1, 2))
col <-c("blue", "darkgreen", "brown")
clr <- c(col, "red")
covEllipses(cbind(Sepal.Length, Sepal.Width) ~ Species, data=iris,
pooled = TRUE,
fill=TRUE,
fill.alpha = 0.1,
lwd = 3,
col = clr,
cex = 1.5, cex.lab = 1.5,
label.pos = c(3, 1, 3, 0),
xlim = c(4,8), ylim = c(2,4))
heplot(iris.mod, size = "effect",
cex = 1.5, cex.lab = 1.5,
fill=TRUE, fill.alpha=c(0.3,0.1),
xlim = c(4,8), ylim = c(2,4))
par(op)
```


The geometry of ellipsoids and multivariate tests allow us to go further with another re-scaling of the $\mat{H}$ ellipsoid
that gives a _visual test of significance_ for any term in a MLM.
This is done simply by dividing $\mat{H} / df_e$ further
by the $\alpha$-critical value of the corresponding test statistic to show the strength of evidence against
the null hypothesis.
Among the various multivariate test statistics,
Roy's maximum root test gives $\mat{H} / (\lambda_\alpha \text{df}_e)$
Roy's maximum root test, based on the largest eigenvalue $\lambda_1$ of $\mat{H} \mat{E}^{-1},
gives $\mat{H} / (\lambda_\alpha df_e)$
which has the visual property that the
scaled $\mat{H}$ ellipsoid will protrude _somewhere_ outside the standard $\mat{E}$ ellipsoid if and only if
Roy's test is significant at significance level $\alpha$. For these data, the HE plot using
significance scaling is shown in the right panel of \figref{fig:heplot-iris1}.
Roy's test is significant at significance level $\alpha$. The critical value $\lambda_\alpha$ for Roy's
test is
$$
\lambda_\alpha = \left(\frac{\text{df}_1}{\text{df}_2}\right) \; F_{\text{df}_1, \text{df}_2}^{1-\alpha} \comma
$$
where $\text{df}_1 = \max(p, \text{df}_h)$ and $\text{df}_2 = \text{df}_h + \text{df}_e - \text{df}_1$.

For these data, the HE plot using
significance scaling is shown in the right panel of \figref{fig:heplot-iris1}.


## Canonical discriminant analysis {#sec-candisc}
Expand Down
30 changes: 24 additions & 6 deletions R/iris/iris-HE.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,34 @@ iris.mod <- lm(cbind(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) ~
Species, data=iris)

Anova(iris.mod)

summary(iris.mod)

summary(iris.mod, univariate = TRUE)

# tests for each response

glance(iris.mod)

op <- par(mar = c(4, 4, 1, 1) + .5)
col <-c("blue", "darkgreen", "brown")
clr <- c(col, "red")

op <- par(mar = c(4, 4, 1, 1) + .5,
mfrow = c(1, 2))
covEllipses(cbind(Sepal.Length, Sepal.Width) ~ Species, data=iris,
pooled = TRUE,
fill=TRUE,
fill.alpha = 0.1,
lwd = 3,
col = clr,
cex = 1.5, cex.lab = 1.5,
label.pos = c(3, 1, 3, 0),
xlim = c(4,8), ylim = c(2,4))

heplot(iris.mod, size = "effect",
cex = 1.5, cex.lab = 1.5,
fill=TRUE, fill.alpha=c(0.3,0.1),
xlim = c(4,8), ylim = c(2,4))
par(op)

op <- par(mar = c(4, 4, 1, 1) + .5,
mfrow = c(1, 2))
heplot(iris.mod, size = "effect",
cex = 1.5, cex.lab = 1.5,
fill=TRUE, fill.alpha=c(0.3,0.1),
Expand All @@ -29,7 +47,7 @@ text(10, 4.5, expression(paste("Effect size scaling:", bold(H) / df[e])),
pos = 2, cex = 1.2)

heplot(iris.mod, size = "evidence",
cex = 1.5,
cex = 1.5, cex.lab = 1.5,
fill=TRUE, fill.alpha=c(0.3,0.1),
xlim = c(2,10), ylim = c(1.4,4.6))
text(10, 4.5, expression(paste("Significance scaling:", bold(H) / (lambda[alpha] * df[e]))),
Expand Down
30 changes: 5 additions & 25 deletions R/penguin/HE-penguins.R
Original file line number Diff line number Diff line change
@@ -1,36 +1,16 @@

library(dplyr)
library(readr)
#library(readr)
#library(tidyr)
library(car)
library(heplots)
library(candisc)
library(palmerpenguins)


#url <- "https://raw.githubusercontent.com/friendly/penguins/master/data/penguins_size.csv"
#url <- "https://raw.githubusercontent.com/allisonhorst/penguins/master/data/penguins_size.csv"
#
#penguins <-read_csv(url)

# peng <- penguins %>%
# rename(
# bill_length = bill_length_mm,
# bill_depth = bill_depth_mm,
# flipper_length = flipper_length_mm,
# body_mass = body_mass_g
# ) %>%
# mutate(species = as.factor(species),
# island = as.factor(island),
# sex = as.factor(substr(sex,1,1))) %>%
# filter(!is.na(bill_depth),
# !is.na(sex))
#
# str(peng)
# View(peng)
#library(palmerpenguins)


data(peng, package="heplots")

data(peng, package="heplots")
source(here::here("R", "penguin", "penguin-colors.R"))

# vars <- paste(names(peng)[-1], collapse="\n")
# cat(vars)
Expand Down
52 changes: 52 additions & 0 deletions R/penguin/peng-HE.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
library(dplyr)
library(car)
library(heplots)
library(candisc)

data(peng, package="heplots")
source(here::here("R", "penguin", "penguin-colors.R"))
## MANOVA

contrasts(peng$species)<-matrix(c(1,-1,0, -1, -1, -2), 3,2)
contrasts(peng$species)


peng.mod <-lm(cbind(bill_length, bill_depth, flipper_length, body_mass) ~ species, data=peng)
etasq(peng.mod)

col <- peng.colors("dark")
pch <- 15:17

clr <- c(col, "red")

# data ellipses vs HE plot

covEllipses(cbind(bill_length, bill_depth) ~ species, data=peng,
pooled = TRUE,
fill=TRUE,
fill.alpha = 0.1,
lwd = 3,
col = clr,
cex.lab = 1.25,
xlim = c(35, 55), ylim = c(14, 20))

heplot(peng.mod, size = "effect",
fill=TRUE, fill.alpha=0.1,
cex = 1.25, cex.lab = 1.25,
xlim = c(35, 55), ylim = c(14, 20))


# effect vs evidence scaling

heplot(peng.mod, size = "effect",
fill=TRUE, fill.alpha=0.1,
cex = 1.25, cex.lab = 1.25,
xlim = c(0, 80), ylim = c(0, 30))

heplot(peng.mod, size = "evidence",
fill=TRUE, fill.alpha=0.1,
cex = 1.25, cex.lab = 1.25,
xlim = c(0, 80), ylim = c(0, 30))



27 changes: 0 additions & 27 deletions bib/pkgs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -116,30 +116,3 @@ knitr
matlib
patchwork
tidyr
broom
car
carData
dplyr
ggplot2
heplots
knitr
tidyr
broom
candisc
car
carData
dplyr
ggplot2
heplots
knitr
tidyr
broom
candisc
car
carData
corrgram
dplyr
ggplot2
heplots
knitr
tidyr
4 changes: 2 additions & 2 deletions docs/01-intro.html
Original file line number Diff line number Diff line change
Expand Up @@ -378,7 +378,7 @@ <h1 class="title"><span id="sec-introduction" class="quarto-section-identifier">
</section><section id="visualization-is-harder" class="level2" data-number="1.4"><h2 data-number="1.4" class="anchored" data-anchor-id="visualization-is-harder">
<span class="header-section-number">1.4</span> Visualization is harder</h2>
<p>However, with two or more response variables, visualizations for multivariate models are not as simple as they are for their univariate counterparts for understanding the effects of predictors, model parameters, or model diagnostics. Consequently, the results of such studies are often explored and discussed solely in terms of coefficients and significance, and visualizations of the relationships are only provided for one response variable at a time, if at all. This tradition can mask important nuances, and lead researchers to draw erroneous conclusions.</p>
<p>The aim of this book is to describe and illustrate some central methods that we have developed over the last ten years that aid in the understanding and communication of the results of multivariate linear models <span class="citation" data-cites="Friendly-07-manova FriendlyMeyer:2016:DDAR">(<a href="95-references.html#ref-Friendly-07-manova" role="doc-biblioref">Friendly, 2007</a>;<!-- @Friendly-etal:ellipses:2013; --> <a href="95-references.html#ref-FriendlyMeyer:2016:DDAR" role="doc-biblioref">Friendly &amp; Meyer, 2016</a>)</span>. These methods rely on <em>data ellipsoids</em> as simple, minimally sufficient visualizations of variance that can be shown in 2D and 3D plots. As will be demonstrated, the <em>Hypothesis-Error (HE) plot</em> framework applies this idea to the results of multivariate tests of linear hypotheses. </p>
<p>The aim of this book is to describe and illustrate some central methods that we have developed over the last ten years that aid in the understanding and communication of the results of multivariate linear models <span class="citation" data-cites="Friendly-07-manova FriendlyMeyer:2016:DDAR">(<a href="#ref-Friendly-07-manova" role="doc-biblioref">Friendly, 2007</a>;<!-- @Friendly-etal:ellipses:2013; --> <a href="#ref-FriendlyMeyer:2016:DDAR" role="doc-biblioref">Friendly &amp; Meyer, 2016</a>)</span>. These methods rely on <em>data ellipsoids</em> as simple, minimally sufficient visualizations of variance that can be shown in 2D and 3D plots. As will be demonstrated, the <em>Hypothesis-Error (HE) plot</em> framework applies this idea to the results of multivariate tests of linear hypotheses. </p>
<p>Further, in the case where there are more than just a few outcome variables, the important nectar of their relationships to predictors can often be distilled in a multivariate juicer— a <strong>projection</strong> of the multivariate relationships to the predictors in the low-D space that captures most of the flavor. This idea can be applied using <em>canonical correlation plots</em> and with <em>canonical discriminant HE plots</em>. </p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure"><p><img src="images/Cover-GBE.png" class="img-fluid figure-img"></p>
Expand All @@ -401,7 +401,7 @@ <h1 class="title"><span id="sec-introduction" class="quarto-section-identifier">
<!-- ## References {.unnumbered} -->


<div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0" data-line-spacing="2" role="list" style="display: none">
<div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0" data-line-spacing="2" role="list">
<div id="ref-Friendly-07-manova" class="csl-entry" role="listitem">
Friendly, M. (2007). <span>HE</span> plots for multivariate general linear models. <em>Journal of Computational and Graphical Statistics</em>, <em>16</em>(2), 421–444. <a href="https://doi.org/10.1198/106186007X208407">https://doi.org/10.1198/106186007X208407</a>
</div>
Expand Down
Loading

0 comments on commit ad3544d

Please sign in to comment.