work on ch11

friendly · Dec 19, 2024 · ad3544d · ad3544d
1 parent b1e9e25
commit ad3544d
Show file tree

Hide file tree

Showing 37 changed files with 551 additions and 7,863 deletions.
diff --git a/03-multivariate_plots.qmd b/03-multivariate_plots.qmd
@@ -580,7 +580,7 @@ as the Choleski factor of $\mathbf{S}$. Slightly abusing notation and taking the
 we can write the data ellipsoid as simply:
 
 $$
-\mathcal{E}_c (\bar{\mathbf{y}}, \mathbf{S}) = \bar{\mathbf{y}} \; \oplus \; \sqrt{\mathbf{S}} \period
+\mathcal{E}_c (\bar{\mathbf{y}}, \mathbf{S}) = \bar{\mathbf{y}} \; \oplus \; c\, \sqrt{\mathbf{S}} \period
 $$ {#eq-ellE}
 
 When $\mathbf{y}$ is (at least approximately) bivariate normal,

diff --git a/11-mlm-viz.qmd b/11-mlm-viz.qmd
@@ -104,9 +104,9 @@ shadow of the $\mat{E}$ ellipsoid on any axis (see @fig-galton-ellipse-r).
 The $\mat{E}$ ellipsoid is then translated to the overall (grand) means $\bar{\mathbf{y}}$ of the variables plotted, which allows us to show the means for factor levels on the same scale, facilitating interpretation.
 In the notation of @eq-ellE, the error ellipsoid is given by
 $$
-\mathcal{E}_c (\bar{\mathbf{y}}, \mathbf{E}) = \bar{\mathbf{y}} \oplus \mathbf{E}^{1/2} \comma
+\mathcal{E}_c (\bar{\mathbf{y}}, \mathbf{E}) = \bar{\mathbf{y}} \; \oplus \; c\,\mathbf{E}^{1/2} \comma
 $$
-where, for 2D plots $c = \sqrt{2 F_{2, n-2}^{0.68}}$.
+where $c = \sqrt{2 F_{2, n-2}^{0.68}}$ for 2D plots and $c = \sqrt{3 F_{3, n-3}^{0.68}}$ for 3D.
 
 An ellipsoid representing variation in the means of a factor (or any other term reflected in a general linear hypothesis test, @eq-hmat) in the $\mat{H}$ matrix is simply the data ellipse of the fitted values for that term. 
 Dividing the hypothesis matrix by the error degrees of freedom, giving
@@ -116,16 +116,50 @@ puts this on the same scale as the \E ellipse.
 I refer to this as _effect size scaling_, because it is similar to an effect size index used in
 univariate models, e.g., $ES = (\bar{y}_1 - \bar{y}_2) / s_e$ in a two-group, univariate design.
 
-The geometry of ellipsoids and multivariate tests allow us to go further with a re-scaling of the $\mat{H}$ ellipsoid
-that gives a \emph{visual test of significance} for any term in a MLM, simply by dividing $\mat{H} / \text{df}_e$ further
-by the $\alpha$-critical value of the corresponding test statistic.
+This is illustrated in ...
+
+```{r}
+op <- par(mar = c(4, 4, 1, 1) + .5,
+          mfrow = c(1, 2))
+col <-c("blue", "darkgreen", "brown")
+clr <- c(col, "red")
+covEllipses(cbind(Sepal.Length, Sepal.Width) ~ Species, data=iris,
+      pooled = TRUE,
+      fill=TRUE,
+      fill.alpha = 0.1,
+      lwd = 3,
+      col = clr,
+      cex = 1.5, cex.lab = 1.5,
+      label.pos = c(3, 1, 3, 0),
+      xlim = c(4,8), ylim = c(2,4))
+
+heplot(iris.mod, size = "effect",
+       cex = 1.5, cex.lab = 1.5,
+       fill=TRUE, fill.alpha=c(0.3,0.1),
+       xlim = c(4,8), ylim = c(2,4))
+par(op)
+```
+
+
+The geometry of ellipsoids and multivariate tests allow us to go further with another re-scaling of the $\mat{H}$ ellipsoid
+that gives a _visual test of significance_ for any term in a MLM.
+This is done simply by dividing $\mat{H} / df_e$ further
+by the $\alpha$-critical value of the corresponding test statistic to show the strength of evidence against
+the null hypothesis.
 Among the various multivariate test statistics,
-Roy's maximum root test gives $\mat{H} / (\lambda_\alpha \text{df}_e)$
+Roy's maximum root test, based on the largest eigenvalue $\lambda_1$ of $\mat{H} \mat{E}^{-1},
+gives $\mat{H} / (\lambda_\alpha df_e)$
 which has the visual property that the
 scaled $\mat{H}$ ellipsoid will protrude _somewhere_ outside the standard $\mat{E}$ ellipsoid if and only if
-Roy's test is significant at significance level $\alpha$. For these data, the HE plot using
-significance scaling is shown in the right panel of \figref{fig:heplot-iris1}.
+Roy's test is significant at significance level $\alpha$. The critical value $\lambda_\alpha$ for Roy's
+test is
+$$
+\lambda_\alpha = \left(\frac{\text{df}_1}{\text{df}_2}\right) \; F_{\text{df}_1, \text{df}_2}^{1-\alpha} \comma
+$$
+where $\text{df}_1 = \max(p, \text{df}_h)$ and $\text{df}_2 = \text{df}_h + \text{df}_e - \text{df}_1$.
 
+For these data, the HE plot using
+significance scaling is shown in the right panel of \figref{fig:heplot-iris1}.
 
 
 ## Canonical discriminant analysis {#sec-candisc}

diff --git a/R/iris/iris-HE.R b/R/iris/iris-HE.R
@@ -11,16 +11,34 @@ iris.mod <- lm(cbind(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) ~
                  Species, data=iris)
 
 Anova(iris.mod)
-
 summary(iris.mod)
-
 summary(iris.mod, univariate = TRUE)
-
 # tests for each response
-
 glance(iris.mod)
 
-op <- par(mar = c(4, 4, 1, 1) + .5)
+col <-c("blue", "darkgreen", "brown")
+clr <- c(col, "red")
+
+op <- par(mar = c(4, 4, 1, 1) + .5,
+          mfrow = c(1, 2))
+covEllipses(cbind(Sepal.Length, Sepal.Width) ~ Species, data=iris,
+      pooled = TRUE,
+      fill=TRUE,
+      fill.alpha = 0.1,
+      lwd = 3,
+      col = clr,
+      cex = 1.5, cex.lab = 1.5,
+      label.pos = c(3, 1, 3, 0),
+      xlim = c(4,8), ylim = c(2,4))
+
+heplot(iris.mod, size = "effect",
+       cex = 1.5, cex.lab = 1.5,
+       fill=TRUE, fill.alpha=c(0.3,0.1),
+       xlim = c(4,8), ylim = c(2,4))
+par(op)
+
+op <- par(mar = c(4, 4, 1, 1) + .5,
+          mfrow = c(1, 2))
 heplot(iris.mod, size = "effect",
        cex = 1.5, cex.lab = 1.5,
        fill=TRUE, fill.alpha=c(0.3,0.1),
@@ -29,7 +47,7 @@ text(10, 4.5, expression(paste("Effect size scaling:", bold(H) / df[e])),
      pos = 2, cex = 1.2)
 
 heplot(iris.mod, size = "evidence",
-       cex = 1.5,
+       cex = 1.5, cex.lab = 1.5,
        fill=TRUE, fill.alpha=c(0.3,0.1),
        xlim = c(2,10), ylim = c(1.4,4.6))
 text(10, 4.5, expression(paste("Significance scaling:", bold(H) / (lambda[alpha] * df[e]))), 

diff --git a/R/penguin/HE-penguins.R b/R/penguin/HE-penguins.R
@@ -1,36 +1,16 @@
 
 library(dplyr)
-library(readr)
+#library(readr)
 #library(tidyr)
 library(car)
 library(heplots)
 library(candisc)
-library(palmerpenguins)
-
-
-#url <- "https://raw.githubusercontent.com/friendly/penguins/master/data/penguins_size.csv"
-#url <- "https://raw.githubusercontent.com/allisonhorst/penguins/master/data/penguins_size.csv"
-#
-#penguins <-read_csv(url)
-
-# peng <- penguins %>%
-# 	rename(
-#          bill_length = bill_length_mm, 
-#          bill_depth = bill_depth_mm, 
-#          flipper_length = flipper_length_mm, 
-#          body_mass = body_mass_g
-#          ) %>%
-#   mutate(species = as.factor(species),
-#          island = as.factor(island),
-#          sex = as.factor(substr(sex,1,1))) %>%
-#   filter(!is.na(bill_depth),
-#          !is.na(sex))
-# 
-# str(peng)
-# View(peng)
+#library(palmerpenguins)
+
 
-data(peng, package="heplots")
 
+data(peng, package="heplots")
+source(here::here("R", "penguin", "penguin-colors.R"))
 
 # vars <- paste(names(peng)[-1], collapse="\n")
 # cat(vars)

diff --git a/R/penguin/peng-HE.R b/R/penguin/peng-HE.R
@@ -0,0 +1,52 @@
+library(dplyr)
+library(car)
+library(heplots)
+library(candisc)
+
+data(peng, package="heplots")
+source(here::here("R", "penguin", "penguin-colors.R"))
+## MANOVA
+
+contrasts(peng$species)<-matrix(c(1,-1,0, -1, -1, -2), 3,2)
+contrasts(peng$species)
+
+
+peng.mod <-lm(cbind(bill_length, bill_depth, flipper_length, body_mass) ~ species, data=peng)
+etasq(peng.mod)
+
+col <- peng.colors("dark")
+pch <- 15:17
+
+clr <- c(col, "red")
+
+# data ellipses vs HE plot
+
+covEllipses(cbind(bill_length, bill_depth) ~ species, data=peng,
+            pooled = TRUE,
+            fill=TRUE,
+            fill.alpha = 0.1,
+            lwd = 3,
+            col = clr,
+            cex.lab = 1.25,
+            xlim = c(35, 55), ylim = c(14, 20))
+
+heplot(peng.mod, size = "effect",
+       fill=TRUE, fill.alpha=0.1,
+       cex = 1.25, cex.lab = 1.25,
+       xlim = c(35, 55), ylim = c(14, 20))
+
+
+# effect vs evidence scaling
+
+heplot(peng.mod, size = "effect",
+       fill=TRUE, fill.alpha=0.1,
+       cex = 1.25, cex.lab = 1.25,
+       xlim = c(0, 80), ylim = c(0, 30))
+
+heplot(peng.mod, size = "evidence",
+       fill=TRUE, fill.alpha=0.1,
+       cex = 1.25, cex.lab = 1.25,
+       xlim = c(0, 80), ylim = c(0, 30))
+
+
+
diff --git a/bib/pkgs.txt b/bib/pkgs.txt
@@ -116,30 +116,3 @@ knitr
 matlib
 patchwork
 tidyr
-broom
-car
-carData
-dplyr
-ggplot2
-heplots
-knitr
-tidyr
-broom
-candisc
-car
-carData
-dplyr
-ggplot2
-heplots
-knitr
-tidyr
-broom
-candisc
-car
-carData
-corrgram
-dplyr
-ggplot2
-heplots
-knitr
-tidyr
diff --git a/docs/01-intro.html b/docs/01-intro.html
@@ -378,7 +378,7 @@ <h1 class="title"><span id="sec-introduction" class="quarto-section-identifier">
 </section><section id="visualization-is-harder" class="level2" data-number="1.4"><h2 data-number="1.4" class="anchored" data-anchor-id="visualization-is-harder">
 <span class="header-section-number">1.4</span> Visualization is harder</h2>
 <p>However, with two or more response variables, visualizations for multivariate models are not as simple as they are for their univariate counterparts for understanding the effects of predictors, model parameters, or model diagnostics. Consequently, the results of such studies are often explored and discussed solely in terms of coefficients and significance, and visualizations of the relationships are only provided for one response variable at a time, if at all. This tradition can mask important nuances, and lead researchers to draw erroneous conclusions.</p>
-<p>The aim of this book is to describe and illustrate some central methods that we have developed over the last ten years that aid in the understanding and communication of the results of multivariate linear models <span class="citation" data-cites="Friendly-07-manova FriendlyMeyer:2016:DDAR">(<a href="95-references.html#ref-Friendly-07-manova" role="doc-biblioref">Friendly, 2007</a>;<!-- @Friendly-etal:ellipses:2013;  --> <a href="95-references.html#ref-FriendlyMeyer:2016:DDAR" role="doc-biblioref">Friendly &amp; Meyer, 2016</a>)</span>. These methods rely on <em>data ellipsoids</em> as simple, minimally sufficient visualizations of variance that can be shown in 2D and 3D plots. As will be demonstrated, the <em>Hypothesis-Error (HE) plot</em> framework applies this idea to the results of multivariate tests of linear hypotheses. </p>
+<p>The aim of this book is to describe and illustrate some central methods that we have developed over the last ten years that aid in the understanding and communication of the results of multivariate linear models <span class="citation" data-cites="Friendly-07-manova FriendlyMeyer:2016:DDAR">(<a href="#ref-Friendly-07-manova" role="doc-biblioref">Friendly, 2007</a>;<!-- @Friendly-etal:ellipses:2013;  --> <a href="#ref-FriendlyMeyer:2016:DDAR" role="doc-biblioref">Friendly &amp; Meyer, 2016</a>)</span>. These methods rely on <em>data ellipsoids</em> as simple, minimally sufficient visualizations of variance that can be shown in 2D and 3D plots. As will be demonstrated, the <em>Hypothesis-Error (HE) plot</em> framework applies this idea to the results of multivariate tests of linear hypotheses. </p>
 <p>Further, in the case where there are more than just a few outcome variables, the important nectar of their relationships to predictors can often be distilled in a multivariate juicer— a <strong>projection</strong> of the multivariate relationships to the predictors in the low-D space that captures most of the flavor. This idea can be applied using <em>canonical correlation plots</em> and with <em>canonical discriminant HE plots</em>. </p>
 <div class="quarto-figure quarto-figure-center">
 <figure class="figure"><p><img src="images/Cover-GBE.png" class="img-fluid figure-img"></p>
@@ -401,7 +401,7 @@ <h1 class="title"><span id="sec-introduction" class="quarto-section-identifier">
 <!-- ## References {.unnumbered} -->
 
 
-<div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0" data-line-spacing="2" role="list" style="display: none">
+<div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0" data-line-spacing="2" role="list">
 <div id="ref-Friendly-07-manova" class="csl-entry" role="listitem">
 Friendly, M. (2007). <span>HE</span> plots for multivariate general linear models. <em>Journal of Computational and Graphical Statistics</em>, <em>16</em>(2), 421–444. <a href="https://doi.org/10.1198/106186007X208407">https://doi.org/10.1198/106186007X208407</a>
 </div>
-Original file line number
+Diff line change
@@ Expand Up / @@ -116,30 +116,3 @@ knitr @@
     matlib
     patchwork
     tidyr
-    broom
-    car
-    carData
-    dplyr
-    ggplot2
-    heplots
-    knitr
-    tidyr
-    broom
-    candisc
-    car
-    carData
-    dplyr
-    ggplot2
-    heplots
-    knitr
-    tidyr
-    broom
-    candisc
-    car
-    carData
-    corrgram
-    dplyr
-    ggplot2
-    heplots
-    knitr
-    tidyr