diff --git a/04-pca-biplot.qmd b/04-pca-biplot.qmd
index 682e9f87..ab8db3a7 100644
--- a/04-pca-biplot.qmd
+++ b/04-pca-biplot.qmd
@@ -174,13 +174,23 @@ data, with points colored by species and the 95% data ellipsoid. This is rotated
 Because this is a rigid rotation of the cloud of points, the total variability is obviously unchanged.
 
 
-::: {#fig-pca-animation}
-<div align="center">
-<iframe width="946" height="594" src="images/pca-animation1.gif"></iframe>
-</div>
-Animation of PCA as a rotation in 3D space. The plot shows three variables for the `iris` data, initially
-in data space and its' data ellipsoid, with points colored according to species of the iris flowers. This is rotated smoothly until the first two principal axes are aligned with the horizontal and vertical dimensions.
+<!-- ::: {#fig-pca-animation} -->
+<!-- <div align="center"> -->
+<!-- <iframe width="946" height="594" src="images/pca-animation1.gif"></iframe> -->
+<!-- </div> -->
+<!-- Animation of PCA as a rotation in 3D space. The plot shows three variables for the `iris` data, initially -->
+<!-- in data space and its' data ellipsoid, with points colored according to species of the iris flowers. This is rotated smoothly until the first two principal axes are aligned with the horizontal and vertical dimensions. -->
 
+<!-- ::: -->
+
+::: {.content-visible unless-format="pdf"}
+```{r}
+#| label: fig-pca-animation
+#| out-width: "100%"
+#| echo: false
+#| fig-cap: "Animation of PCA as a rotation in 3D space. The plot shows three variables for the #| `iris` data, initially in data space and its' data ellipsoid, with points colored according #| to species of the iris flowers. This is rotated smoothly until the first two principal axes #| are aligned with the horizontal and vertical dimensions."
+knitr::include_graphics("images/pca-animation1.gif")
+```
 :::
 
 
@@ -227,12 +237,21 @@ The **FactoMineR** package [@R-FactoMineR]
 has extensive capabilities for exploratory analysis of multivariate data (PCA, correspondence analysis, cluster analysis, ...). 
 
 Unfortunately, although all of these performing similar calculations, the options for
-analysis and the details of the result they return differ ...
+analysis and the details of the result they return differ.
 
 The important options for analysis include: 
 
-* whether or not the data variables are **centered**, to a mean of 0
-* whether or not the data variables are **scaled**, to a variance of 1.
+* whether or not the data variables are **centered**, to a mean of $\bar{x}_j =0$
+* whether or not the data variables are **scaled**, to a variance of $\text{Var}(x_j) =1$.
+
+It nearly always makes sense to center the variables. The choice of
+scaling determines whether the correlation matrix is analyzed, so that
+each variable contributes equally to the total variance that is to be accounted for
+versus analysis of the covariance matrix, where each variable contributes its
+own variance to the total. Analysis of the covariance matrix makes little sense
+when the variables are measured on different scales.[^pca-scales]
+
+[^pca-scales]: For example, if two variables in the analysis are height and weight, changing the unit of height from inches to centimeters would multiply its' variance by $2.54^2$; changing weight from pounds to kilograms would divide its' variance by $2.2^2$.
 
 #### Example: Crime data {.unnumbered}
 
@@ -304,6 +323,7 @@ of components to extract a desired proportion of total variance, usually in the
 
 ```{r}
 #| label: fig-crime-ggscreeplot
+#| fig-height: 4
 #| out-width: "100%"
 #| fig-cap: "Screeplots for the PCA of the crime data. The left panel shows the traditional version, plotting variance proportions against component number, with linear guideline for the scree rule of thumb. The right panel plots cumulative proportions, showing cutoffs of 80%, 90%."
 p1 <- ggscreeplot(crime.pca) +
@@ -352,7 +372,7 @@ crime.pca |>
   broom::augment(crime) |> head()
 ```
 
-Then, we can use `ggplot()` to plot and pair of components.
+Then, we can use `ggplot()` to plot any pair of components.
 To aid interpretation, I label the points by their state abbreviation and color them
 by `region` of the U.S.. A geometric interpretation of the plot requires 
 an aspect ratio of 1.0 (via `coord_fixed()`)
@@ -387,9 +407,10 @@ and West Virginia. The second component has most of the southern states on the l
 and Massachusetts, Rhode Island and Hawaii on the high end. However, interpretation is
 easier when we also consider how the various crimes contribute to these dimensions.
 
-We could obviously go further and plot other pairs of components,
+When, as here, there
+are more than two components that seem important in the scree plot,
+we could obviously go further and plot other pairs.
 
-**TODO**: Add plot of PC1 vs. PC3
 #### Variable vectors {.unnumbered}
 
 You can extract the variable loadings using either `crime.pca$rotation` or
@@ -543,11 +564,18 @@ $\widehat{\mathbf{X}}$ as the product of two matrices,
 $$
 \widehat{\mathbf{X}} = (\mathbf{U} \mathbf{\Lambda}^\alpha) (\mathbf{\Lambda}^{1-\alpha} \mathbf{V}') = \mathbf{A} \mathbf{B}'
 $$
-
-The choice $\alpha = 1$, assigning the singular values totally to the left factor,
- gives a distance interpretation to the row display and 
+This notation uses a little math trick involving a power, $0 \le \alpha \le 1$:
+When $\alpha = 1$, $\mathbf{\Lambda}^\alpha = \mathbf{\Lambda}^1  =\mathbf{\Lambda}$,
+and $\mathbf{\Lambda}^{1-\alpha} = \mathbf{\Lambda}^0  =\mathbf{I}$.
+$\alpha = 1/2$ gives the diagonal matrix $\mathbf{\Lambda}^1/2$ whose elements are the square roots of the singular values.
+
+The choice $\alpha = 1$ assigns the singular values totally to the left factor;
+then, the angle between two variable vectors, reflecting the inner product 
+$\mathbf{x}_j^T, \mathbf{x}_{j'}$ approximates their correlation or covariance,
+and the distance between the points approximates their Mahalanobis distances.
 $\alpha = 0$ gives a distance interpretation to the column display.
 $\alpha = 1/2$ gives a symmetrically scaled biplot.
+*TODO**: Explain this better.
 
 When the singular values are assigned totally to the left or to the right factor, the resultant 
 coordinates are called _principal coordinates_ and the sum of squared coordinates
@@ -560,13 +588,17 @@ values equal to 1.0.
 
 ### Biplots in R
 
-There are a large number of R packages providing biplots, ...
+There are a large number of R packages providing biplots. The most basic, `stats::biplot()`, provides methods for `"prcomp"` and `"princomp"` objects.
+
+**TODO**: Mention **factoextra** package, `fviz()`, `fviz_pca_biplot()`, ... giving `ggplot2` graphics. Also mention **adegraphics** package
 
-Here, I use the **ggbiplot** package ...
+Here, I use the **ggbiplot** package, which aims to provide a simple interface to biplots within the `ggplot2` framework.
 
 ### Example
 
-A basic biplot, using standardized principal components and labeling the observation by their state abbreviation is shown in @fig-crime-biplot1.
+A basic biplot of the `crime` data, using standardized principal components and labeling the observation by their state abbreviation is shown in @fig-crime-biplot1.
+The correlation circle indicates that these components are uncorrelated and have
+equal variance in the display.
 ```{r}
 #| label: fig-crime-biplot1
 #| out-width: "80%"
@@ -582,10 +614,13 @@ ggbiplot(crime.pca,
   theme_minimal(base_size = 14) 
 ```
 
+In this dataset the states are grouped by region and we saw some differences among regions in the plot (@fig-crime-scores-plot12) of component scores.
+`ggbiplot()` provides options to include a `groups =` variable, used to
+color the observation points and also to draw their data ellipses, facilitating interpretation.
 ```{r}
 #| label: fig-crime-biplot2
 #| out-width: "80%"
-#| fig-cap: "Enhanced biplot of the crime data. ..."
+#| fig-cap: "Enhanced biplot of the crime data, grouping the states by region and adding data ellipses."
 ggbiplot(crime.pca,
    obs.scale = 1, var.scale = 1,
    groups = crime$region,
@@ -601,6 +636,48 @@ ggbiplot(crime.pca,
   theme(legend.direction = 'horizontal', legend.position = 'top')
 ```
 
+This plot provides what is necessary to interpret the nature of the components and also the variation of the states in relation to these. In this, the data ellipses for the regions
+provide a visual summary that aids interpretation.
+
+* From the variable vectors, it seems that PC1, having all positive and nearly equal loadings, reflects a total or overall index of crimes. Nevada, California, New York and Florida are highest on this, while North Dakota, South Dakota and West Virginia are lowest.
+
+* The second component, PC2, shows a contrast between crimes against persons (murder, assault, rape) at the top and property crimes (auto theft, larceny) at the bottom. Nearly all the Southern states are high on personal crimes; states in the North East are generally higher
+on property crimes.
+
+* Western states tend to be somewhat higher on overall crime rate, while North Central are lower on average. In these states there is not much variation in the relative proportions of personal vs. property crimes.
+
+Moreover, in this biplot you can interpret the the value for a particular state on a given crime by considering its projection on the variable vector, where the origin corresponds to the mean, positions along the vector have greater than average values on that crime, and the opposite direction have lower than average values. For example, Massachusetts has the highest value on auto theft, but a value less than the mean. Louisiana and South Carolina on the other hand are highest in the rate of murder and slightly less than average on auto theft.
+
+These 2D plots account for only 76.5% of the total variance of crimes, so it is useful to also examine the third principal component, which accounts for an additional 10.4%.
+The `choices =` option controls which dimensions are plotted.
+
+```{r}
+#| label: fig-crime-biplot3
+#| out-width: "80%"
+#| fig-cap: "Biplot of dimensions 1 & 3 of the crime data."
+ggbiplot(crime.pca,
+         choices = c(1,3),
+         obs.scale = 1, var.scale = 1,
+         groups = crime$region,
+         labels = crime$st,
+         labels.size = 4,
+         var.factor = 2,
+         ellipse = TRUE, ellipse.level = 0.5, ellipse.alpha = 0.1,
+         circle = TRUE,
+         varname.size = 4,
+         varname.color = "black") +
+  labs(fill = "Region", color = "Region") +
+  theme_minimal(base_size = 14) +
+  theme(legend.direction = 'horizontal', legend.position = 'top')
+```
+
+Dimension 3 in @fig-crime-biplot3 is more subtle. One interpretation is a contrast between
+larceny, which is a simple theft and robbery, which involves stealing something from a person
+and is considered a more serious crime with an element of possible violence.
+In this plot, murder has a relatively short variable vector, so does not contribute
+very much to differences among the states.
+
+
 
 ## Elliptical insights: Outlier detection
 
diff --git a/R/crime-ggbiplot.R b/R/crime-ggbiplot.R
index a6830e08..762116f2 100644
--- a/R/crime-ggbiplot.R
+++ b/R/crime-ggbiplot.R
@@ -66,4 +66,19 @@ ggbiplot(crime.pca,
   theme_minimal(base_size = 14) +
   theme(legend.direction = 'horizontal', legend.position = 'top')
 
+# PC1 & PC3
+ggbiplot(crime.pca,
+         choices = c(1,3),
+         obs.scale = 1, var.scale = 1,
+         groups = crime$region,
+         labels = crime$st,
+         labels.size = 4,
+         var.factor = 2,
+         ellipse = TRUE, ellipse.level = 0.5, ellipse.alpha = 0.1,
+         circle = TRUE,
+         varname.size = 4,
+         varname.color = "black") +
+  labs(fill = "Region", color = "Region") +
+  theme_minimal(base_size = 14) +
+  theme(legend.direction = 'horizontal', legend.position = 'top')
 
diff --git a/bib/references.bib b/bib/references.bib
index 49127022..e7d0808b 100644
--- a/bib/references.bib
+++ b/bib/references.bib
@@ -561,7 +561,8 @@ @article{Gabriel:71
 	Pages = {453--467},
 	Title = {The Biplot Graphic Display of Matrices with Application to Principal Components Analysis},
 	Volume = {58},
-	Year = {1971}
+	Year = {1971},
+	doi = {10.2307/2334381},
 }
 
 @incollection{Gabriel:81,
@@ -944,6 +945,18 @@ @article{Mardia:1974
 }
 
 
+@Article{McGowan2023,
+  author    = {McGowan, Lucy D’Agostino and Gerke, Travis and Barrett, Malcolm},
+  journal   = {Journal of Statistics and Data Science Education},
+  title     = {Causal inference is not just a statistics problem},
+  year      = {2023},
+  issn      = {2693-9169},
+  month     = dec,
+  pages     = {1--9},
+  doi       = {10.1080/26939169.2023.2276446},
+  publisher = {Informa UK Limited},
+}
+
 @incollection{Monette:90,
 	Address = {Beverly Hills, CA},
 	Author = {Georges Monette},
diff --git a/child/02-anscombe.qmd b/child/02-anscombe.qmd
index 2da26b4c..2915262f 100644
--- a/child/02-anscombe.qmd
+++ b/child/02-anscombe.qmd
@@ -136,7 +136,8 @@ when you look behind the scenes.
 For example, in the context of causal analysis @Gelman-etal:2023, illustrated
 sets of four graphs, within each of which 
 all four represent the same average (latent) causal effect but with
-much different patterns of individual effects.
+much different patterns of individual effects; @McGowan2023 provide another illustration
+with four seemingly identical data sets each generated by a different causal mechanism.
 As an example of machine learning models, @Biecek-etal:2023, introduced the "Rashamon Quartet",
 a synthetic dataset for which four models from different classes 
 (linear model, regression tree, random forest, neural network)
diff --git a/docs/02-getting_started.html b/docs/02-getting_started.html
index 32ce84f6..5145ecdd 100644
--- a/docs/02-getting_started.html
+++ b/docs/02-getting_started.html
@@ -394,7 +394,7 @@ <h1 class="title"><span id="sec-getting_started" class="quarto-section-identifie
 </div>
 </div>
 <div class="callout-body-container callout-body">
-<p>The essential idea of a statistical “quartet” is to illustrate four quite different datasets or circumstances that seem superficially the same, but yet are paradoxically very different when you look behind the scenes. For example, in the context of causal analysis <span class="citation" data-cites="Gelman-etal:2023">Gelman, Hullman, and Kennedy (<a href="90-references.html#ref-Gelman-etal:2023" role="doc-biblioref">2023</a>)</span>, illustrated sets of four graphs, within each of which all four represent the same average (latent) causal effect but with much different patterns of individual effects. As an example of machine learning models, <span class="citation" data-cites="Biecek-etal:2023">Biecek et al. (<a href="90-references.html#ref-Biecek-etal:2023" role="doc-biblioref">2023</a>)</span>, introduced the “Rashamon Quartet”, a synthetic dataset for which four models from different classes (linear model, regression tree, random forest, neural network) have practically identical predictive performance. In all cases, the paradox is solved when their visualization reveals the distinct ways of understanding structure in the data. The <a href="https://r-causal.github.io/quartets/"><strong>quartets</strong></a> package contains these and other variations on this theme.</p>
+<p>The essential idea of a statistical “quartet” is to illustrate four quite different datasets or circumstances that seem superficially the same, but yet are paradoxically very different when you look behind the scenes. For example, in the context of causal analysis <span class="citation" data-cites="Gelman-etal:2023">Gelman, Hullman, and Kennedy (<a href="90-references.html#ref-Gelman-etal:2023" role="doc-biblioref">2023</a>)</span>, illustrated sets of four graphs, within each of which all four represent the same average (latent) causal effect but with much different patterns of individual effects; <span class="citation" data-cites="McGowan2023">McGowan, Gerke, and Barrett (<a href="90-references.html#ref-McGowan2023" role="doc-biblioref">2023</a>)</span> provide another illustration with four seemingly identical data sets each generated by a different causal mechanism. As an example of machine learning models, <span class="citation" data-cites="Biecek-etal:2023">Biecek et al. (<a href="90-references.html#ref-Biecek-etal:2023" role="doc-biblioref">2023</a>)</span>, introduced the “Rashamon Quartet”, a synthetic dataset for which four models from different classes (linear model, regression tree, random forest, neural network) have practically identical predictive performance. In all cases, the paradox is solved when their visualization reveals the distinct ways of understanding structure in the data. The <a href="https://r-causal.github.io/quartets/"><strong>quartets</strong></a> package contains these and other variations on this theme.</p>
 </div>
 </div>
 </section><section id="sec-davis" class="level3" data-number="2.1.2"><h3 data-number="2.1.2" class="anchored" data-anchor-id="sec-davis">
@@ -547,6 +547,9 @@ <h1 class="title"><span id="sec-getting_started" class="quarto-section-identifie
 <div id="ref-MatejkaFitzmaurice2017" class="csl-entry" role="listitem">
 Matejka, Justin, and George Fitzmaurice. 2017. <span>“Same Stats, Different Graphs.”</span> In <em>Proceedings of the 2017 <span>CHI</span> Conference on Human Factors in Computing Systems</em>. <span>ACM</span>. <a href="https://doi.org/10.1145/3025453.3025912">https://doi.org/10.1145/3025453.3025912</a>.
 </div>
+<div id="ref-McGowan2023" class="csl-entry" role="listitem">
+McGowan, Lucy D’Agostino, Travis Gerke, and Malcolm Barrett. 2023. <span>“Causal Inference Is Not Just a Statistics Problem.”</span> <em>Journal of Statistics and Data Science Education</em>, December, 1–9. <a href="https://doi.org/10.1080/26939169.2023.2276446">https://doi.org/10.1080/26939169.2023.2276446</a>.
+</div>
 <div id="ref-Pearson:1896" class="csl-entry" role="listitem">
 Pearson, Karl. 1896. <span>“Contributions to the Mathematical Theory of Evolution—<span>III</span>, Regression, Heredity and Panmixia.”</span> <em>Philosophical Transactions of the Royal Society of London</em>, A, 187: 253–318.
 </div>
diff --git a/docs/04-pca-biplot.html b/docs/04-pca-biplot.html
index 11711cd4..a0ece077 100644
--- a/docs/04-pca-biplot.html
+++ b/docs/04-pca-biplot.html
@@ -373,12 +373,20 @@ <h1 class="title"><span id="sec-pca-biplot" class="quarto-section-identifier"><s
 <li>the total variation of the points in data space, <span class="math inline">\(\text{Var}(x) + \text{Var}(y)\)</span>, being unchanged by rotation, was equally well expressed as the total variation <span class="math inline">\(\text{Var}(PC1) + \text{Var}(PC2)\)</span> of the scores on what are now called the principal component axes.</li>
 </ul>
 <p>It would have appealed to Pearson (and also to A Square) to see these observations demonstrated in a 3D video. <a href="#fig-pca-animation">Figure&nbsp;<span>4.4</span></a> shows a 3D plot of the variables <code>Sepal.Length</code>, <code>Sepal.Width</code> and <code>Petal.Length</code> in Edgar Anderson’s <code>iris</code> data, with points colored by species and the 95% data ellipsoid. This is rotated smoothly by interpolation until the first two principal axes, PC1 and PC2 are aligned with the horizontal and vertical dimensions. Because this is a rigid rotation of the cloud of points, the total variability is obviously unchanged.</p>
+<!-- ::: {#fig-pca-animation} -->
+<!-- <div align="center"> -->
+<!-- <iframe width="946" height="594" src="images/pca-animation1.gif"></iframe> -->
+<!-- </div> -->
+<!-- Animation of PCA as a rotation in 3D space. The plot shows three variables for the `iris` data, initially -->
+<!-- in data space and its' data ellipsoid, with points colored according to species of the iris flowers. This is rotated smoothly until the first two principal axes are aligned with the horizontal and vertical dimensions. -->
+<!-- ::: -->
+<div class="cell" data-layout-align="center">
+<div class="cell-output-display">
 <div id="fig-pca-animation" class="quarto-figure quarto-figure-center anchored">
-<figure class="figure"><div data-align="center">
-<iframe width="946" height="594" src="images/pca-animation1.gif">
-</iframe>
+<figure class="figure"><p><img src="images/pca-animation1.gif" class="img-fluid figure-img" style="width:100.0%"></p>
+<figcaption class="figure-caption">Figure&nbsp;4.4: Animation of PCA as a rotation in 3D space. The plot shows three variables for the #| <code>iris</code> data, initially in data space and its’ data ellipsoid, with points colored according #| to species of the iris flowers. This is rotated smoothly until the first two principal axes #| are aligned with the horizontal and vertical dimensions.</figcaption></figure>
+</div>
 </div>
-<figcaption class="figure-caption">Figure&nbsp;4.4: Animation of PCA as a rotation in 3D space. The plot shows three variables for the <code>iris</code> data, initially in data space and its’ data ellipsoid, with points colored according to species of the iris flowers. This is rotated smoothly until the first two principal axes are aligned with the horizontal and vertical dimensions.</figcaption></figure>
 </div>
 <section id="pca-by-springs" class="level3" data-number="4.2.1"><h3 data-number="4.2.1" class="anchored" data-anchor-id="pca-by-springs">
 <span class="header-section-number">4.2.1</span> PCA by springs</h3>
@@ -400,12 +408,14 @@ <h1 class="title"><span id="sec-pca-biplot" class="quarto-section-identifier"><s
 </section><section id="finding-principal-components" class="level3" data-number="4.2.3"><h3 data-number="4.2.3" class="anchored" data-anchor-id="finding-principal-components">
 <span class="header-section-number">4.2.3</span> Finding principal components</h3>
 <p>In R, principal components analysis is most easily carried out using <code><a href="https://rdrr.io/r/stats/prcomp.html">stats::prcomp()</a></code> or <code><a href="https://rdrr.io/r/stats/princomp.html">stats::princomp()</a></code> or similar functions in other packages such as <code>FactomineR::PCA()</code>. The <strong>FactoMineR</strong> package <span class="citation" data-cites="R-FactoMineR">(<a href="90-references.html#ref-R-FactoMineR" role="doc-biblioref">Husson et al. 2023</a>)</span> has extensive capabilities for exploratory analysis of multivariate data (PCA, correspondence analysis, cluster analysis, …).</p>
-<p>Unfortunately, although all of these performing similar calculations, the options for analysis and the details of the result they return differ …</p>
+<p>Unfortunately, although all of these performing similar calculations, the options for analysis and the details of the result they return differ.</p>
 <p>The important options for analysis include:</p>
 <ul>
-<li>whether or not the data variables are <strong>centered</strong>, to a mean of 0</li>
-<li>whether or not the data variables are <strong>scaled</strong>, to a variance of 1.</li>
+<li>whether or not the data variables are <strong>centered</strong>, to a mean of <span class="math inline">\(\bar{x}_j =0\)</span>
+</li>
+<li>whether or not the data variables are <strong>scaled</strong>, to a variance of <span class="math inline">\(\text{Var}(x_j) =1\)</span>.</li>
 </ul>
+<p>It nearly always makes sense to center the variables. The choice of scaling determines whether the correlation matrix is analyzed, so that each variable contributes equally to the total variance that is to be accounted for versus analysis of the covariance matrix, where each variable contributes its own variance to the total. Analysis of the covariance matrix makes little sense when the variables are measured on different scales.<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a></p>
 <section id="example-crime-data" class="level4 unnumbered"><h4 class="unnumbered anchored" data-anchor-id="example-crime-data">Example: Crime data</h4>
 <p>The dataset <code>crime</code>, analysed in <a href="03-multivariate_plots.html#sec-corrgram"><span>Section&nbsp;3.2.2</span></a>, showed all positive correlations among the rates of various crimes in the corrgram, <a href="03-multivariate_plots.html#fig-crime-corrplot">Figure&nbsp;<span>3.27</span></a>. What can we see from a principal components analysis? Is it possible that a few dimensions can account for most of the juice in this data?</p>
 <p>In this example, you can easily find the PCA solution using <code><a href="https://rdrr.io/r/stats/prcomp.html">prcomp()</a></code> in a single line in base-R. You need to specify the numeric variables to analyze by their columns in the data frame. The most important option here is <code>scale. = TRUE</code> …</p>
@@ -519,7 +529,7 @@ <h1 class="title"><span id="sec-pca-biplot" class="quarto-section-identifier"><s
 <span><span class="co">#&gt; #   .fittedPC4 &lt;dbl&gt;, .fittedPC5 &lt;dbl&gt;, .fittedPC6 &lt;dbl&gt;,</span></span>
 <span><span class="co">#&gt; #   .fittedPC7 &lt;dbl&gt;</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
-<p>Then, we can use <code><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot()</a></code> to plot and pair of components. To aid interpretation, I label the points by their state abbreviation and color them by <code>region</code> of the U.S.. A geometric interpretation of the plot requires an aspect ratio of 1.0 (via <code><a href="https://ggplot2.tidyverse.org/reference/coord_fixed.html">coord_fixed()</a></code>) so that a unit distance on the horizontal axis is the same length as a unit distance on the vertical. To demonstrate that the components are uncorrelated, I also added their data ellipse.</p>
+<p>Then, we can use <code><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot()</a></code> to plot any pair of components. To aid interpretation, I label the points by their state abbreviation and color them by <code>region</code> of the U.S.. A geometric interpretation of the plot requires an aspect ratio of 1.0 (via <code><a href="https://ggplot2.tidyverse.org/reference/coord_fixed.html">coord_fixed()</a></code>) so that a unit distance on the horizontal axis is the same length as a unit distance on the vertical. To demonstrate that the components are uncorrelated, I also added their data ellipse.</p>
 <div class="cell" data-layout-align="center">
 <div class="sourceCode" id="cb10" data-source-line-numbers="nil" data-code-line-numbers="nil"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">crime.pca</span> <span class="op">|&gt;</span></span>
 <span>  <span class="fu">broom</span><span class="fu">::</span><span class="fu"><a href="https://generics.r-lib.org/reference/augment.html">augment</a></span><span class="op">(</span><span class="va">crime</span><span class="op">)</span> <span class="op">|&gt;</span> <span class="co"># add original dataset back in</span></span>
@@ -541,8 +551,8 @@ <h1 class="title"><span id="sec-pca-biplot" class="quarto-section-identifier"><s
 </div>
 </div>
 <p>To interpret such plots, it is useful consider the observations that are a high and low on each of the axes as well as other information, such as region here, and ask how these differ on the crime statistics. The first component, PC1, contrasts Nevada and California with North Dakota, South Dakota and West Virginia. The second component has most of the southern states on the low end and Massachusetts, Rhode Island and Hawaii on the high end. However, interpretation is easier when we also consider how the various crimes contribute to these dimensions.</p>
-<p>We could obviously go further and plot other pairs of components,</p>
-<p><strong>TODO</strong>: Add plot of PC1 vs.&nbsp;PC3 #### Variable vectors {.unnumbered}</p>
+<p>When, as here, there are more than two components that seem important in the scree plot, we could obviously go further and plot other pairs.</p>
+</section><section id="variable-vectors" class="level4 unnumbered"><h4 class="unnumbered anchored" data-anchor-id="variable-vectors">Variable vectors</h4>
 <p>You can extract the variable loadings using either <code>crime.pca$rotation</code> or <code>purrr::pluck("rotation")</code>, similar to what I did with the scores.</p>
 <div class="cell" data-layout-align="center">
 <div class="sourceCode" id="cb11" data-source-line-numbers="nil" data-code-line-numbers="nil"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">crime.pca</span> <span class="op">|&gt;</span> <span class="fu">purrr</span><span class="fu">::</span><span class="fu"><a href="https://purrr.tidyverse.org/reference/pluck.html">pluck</a></span><span class="op">(</span><span class="st">"rotation"</span><span class="op">)</span></span>
@@ -662,16 +672,17 @@ <h1 class="title"><span id="sec-pca-biplot" class="quarto-section-identifier"><s
 <p>The factor, <span class="math inline">\(\alpha\)</span> allows the variances of the components to be apportioned between the row points and column vectors, with different interpretations, by representing the approximation <span class="math inline">\(\widehat{\mathbf{X}}\)</span> as the product of two matrices,</p>
 <p><span class="math display">\[
 \widehat{\mathbf{X}} = (\mathbf{U} \mathbf{\Lambda}^\alpha) (\mathbf{\Lambda}^{1-\alpha} \mathbf{V}') = \mathbf{A} \mathbf{B}'
-\]</span></p>
-<p>The choice <span class="math inline">\(\alpha = 1\)</span>, assigning the singular values totally to the left factor, gives a distance interpretation to the row display and <span class="math inline">\(\alpha = 0\)</span> gives a distance interpretation to the column display. <span class="math inline">\(\alpha = 1/2\)</span> gives a symmetrically scaled biplot.</p>
+\]</span> This notation uses a little math trick involving a power, <span class="math inline">\(0 \le \alpha \le 1\)</span>: When <span class="math inline">\(\alpha = 1\)</span>, <span class="math inline">\(\mathbf{\Lambda}^\alpha = \mathbf{\Lambda}^1 =\mathbf{\Lambda}\)</span>, and <span class="math inline">\(\mathbf{\Lambda}^{1-\alpha} = \mathbf{\Lambda}^0 =\mathbf{I}\)</span>. <span class="math inline">\(\alpha = 1/2\)</span> gives the diagonal matrix <span class="math inline">\(\mathbf{\Lambda}^1/2\)</span> whose elements are the square roots of the singular values.</p>
+<p>The choice <span class="math inline">\(\alpha = 1\)</span> assigns the singular values totally to the left factor; then, the angle between two variable vectors, reflecting the inner product <span class="math inline">\(\mathbf{x}_j^T, \mathbf{x}_{j'}\)</span> approximates their correlation or covariance, and the distance between the points approximates their Mahalanobis distances. <span class="math inline">\(\alpha = 0\)</span> gives a distance interpretation to the column display. <span class="math inline">\(\alpha = 1/2\)</span> gives a symmetrically scaled biplot. *TODO**: Explain this better.</p>
 <p>When the singular values are assigned totally to the left or to the right factor, the resultant coordinates are called <em>principal coordinates</em> and the sum of squared coordinates on each dimension equal the corresponding singular value. The other matrix, to which no part of the singular values is assigned, contains the so-called <em>standard coordinates</em> and have sum of squared values equal to 1.0.</p>
 </section><section id="biplots-in-r" class="level3" data-number="4.3.2"><h3 data-number="4.3.2" class="anchored" data-anchor-id="biplots-in-r">
 <span class="header-section-number">4.3.2</span> Biplots in R</h3>
-<p>There are a large number of R packages providing biplots, …</p>
-<p>Here, I use the <strong>ggbiplot</strong> package …</p>
+<p>There are a large number of R packages providing biplots. The most basic, <code><a href="https://rdrr.io/r/stats/biplot.html">stats::biplot()</a></code>, provides methods for <code>"prcomp"</code> and <code>"princomp"</code> objects.</p>
+<p><strong>TODO</strong>: Mention <strong>factoextra</strong> package, <code>fviz()</code>, <code>fviz_pca_biplot()</code>, … giving <code>ggplot2</code> graphics. Also mention <strong>adegraphics</strong> package</p>
+<p>Here, I use the <strong>ggbiplot</strong> package, which aims to provide a simple interface to biplots within the <code>ggplot2</code> framework.</p>
 </section><section id="example" class="level3" data-number="4.3.3"><h3 data-number="4.3.3" class="anchored" data-anchor-id="example">
 <span class="header-section-number">4.3.3</span> Example</h3>
-<p>A basic biplot, using standardized principal components and labeling the observation by their state abbreviation is shown in <a href="#fig-crime-biplot1">Figure&nbsp;<span>4.9</span></a>.</p>
+<p>A basic biplot of the <code>crime</code> data, using standardized principal components and labeling the observation by their state abbreviation is shown in <a href="#fig-crime-biplot1">Figure&nbsp;<span>4.9</span></a>. The correlation circle indicates that these components are uncorrelated and have equal variance in the display.</p>
 <div class="cell" data-layout-align="center">
 <div class="sourceCode" id="cb15" data-source-line-numbers="nil" data-code-line-numbers="nil"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">crime.pca</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/pkg/ggbiplot/man/reflect.html">reflect</a></span><span class="op">(</span><span class="va">crime.pca</span><span class="op">)</span> <span class="co"># reflect the axes</span></span>
 <span></span>
@@ -689,6 +700,7 @@ <h1 class="title"><span id="sec-pca-biplot" class="quarto-section-identifier"><s
 </div>
 </div>
 </div>
+<p>In this dataset the states are grouped by region and we saw some differences among regions in the plot (<a href="#fig-crime-scores-plot12">Figure&nbsp;<span>4.7</span></a>) of component scores. <code><a href="https://rdrr.io/pkg/ggbiplot/man/ggbiplot.html">ggbiplot()</a></code> provides options to include a <code>groups =</code> variable, used to color the observation points and also to draw their data ellipses, facilitating interpretation.</p>
 <div class="cell" data-layout-align="center">
 <div class="sourceCode" id="cb16" data-source-line-numbers="nil" data-code-line-numbers="nil"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/pkg/ggbiplot/man/ggbiplot.html">ggbiplot</a></span><span class="op">(</span><span class="va">crime.pca</span>,</span>
 <span>   obs.scale <span class="op">=</span> <span class="fl">1</span>, var.scale <span class="op">=</span> <span class="fl">1</span>,</span>
@@ -706,34 +718,65 @@ <h1 class="title"><span id="sec-pca-biplot" class="quarto-section-identifier"><s
 <div class="cell-output-display">
 <div id="fig-crime-biplot2" class="quarto-figure quarto-figure-center anchored">
 <figure class="figure"><p><img src="figs/fig-crime-biplot2-1.png" class="img-fluid figure-img" style="width:80.0%"></p>
-<figcaption class="figure-caption">Figure&nbsp;4.10: Enhanced biplot of the crime data. …</figcaption></figure>
+<figcaption class="figure-caption">Figure&nbsp;4.10: Enhanced biplot of the crime data, grouping the states by region and adding data ellipses.</figcaption></figure>
+</div>
+</div>
+</div>
+<p>This plot provides what is necessary to interpret the nature of the components and also the variation of the states in relation to these. In this, the data ellipses for the regions provide a visual summary that aids interpretation.</p>
+<ul>
+<li><p>From the variable vectors, it seems that PC1, having all positive and nearly equal loadings, reflects a total or overall index of crimes. Nevada, California, New York and Florida are highest on this, while North Dakota, South Dakota and West Virginia are lowest.</p></li>
+<li><p>The second component, PC2, shows a contrast between crimes against persons (murder, assault, rape) at the top and property crimes (auto theft, larceny) at the bottom. Nearly all the Southern states are high on personal crimes; states in the North East are generally higher on property crimes.</p></li>
+<li><p>Western states tend to be somewhat higher on overall crime rate, while North Central are lower on average. In these states there is not much variation in the relative proportions of personal vs.&nbsp;property crimes.</p></li>
+</ul>
+<p>Moreover, in this biplot you can interpret the the value for a particular state on a given crime by considering its projection on the variable vector, where the origin corresponds to the mean, positions along the vector have greater than average values on that crime, and the opposite direction have lower than average values. For example, Massachusetts has the highest value on auto theft, but a value less than the mean. Louisiana and South Carolina on the other hand are highest in the rate of murder and slightly less than average on auto theft.</p>
+<p>These 2D plots account for only 76.5% of the total variance of crimes, so it is useful to also examine the third principal component, which accounts for an additional 10.4%. The <code>choices =</code> option controls which dimensions are plotted.</p>
+<div class="cell" data-layout-align="center">
+<div class="sourceCode" id="cb17" data-source-line-numbers="nil" data-code-line-numbers="nil"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/pkg/ggbiplot/man/ggbiplot.html">ggbiplot</a></span><span class="op">(</span><span class="va">crime.pca</span>,</span>
+<span>         choices <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="fl">1</span>,<span class="fl">3</span><span class="op">)</span>,</span>
+<span>         obs.scale <span class="op">=</span> <span class="fl">1</span>, var.scale <span class="op">=</span> <span class="fl">1</span>,</span>
+<span>         groups <span class="op">=</span> <span class="va">crime</span><span class="op">$</span><span class="va">region</span>,</span>
+<span>         labels <span class="op">=</span> <span class="va">crime</span><span class="op">$</span><span class="va">st</span>,</span>
+<span>         labels.size <span class="op">=</span> <span class="fl">4</span>,</span>
+<span>         var.factor <span class="op">=</span> <span class="fl">2</span>,</span>
+<span>         ellipse <span class="op">=</span> <span class="cn">TRUE</span>, ellipse.level <span class="op">=</span> <span class="fl">0.5</span>, ellipse.alpha <span class="op">=</span> <span class="fl">0.1</span>,</span>
+<span>         circle <span class="op">=</span> <span class="cn">TRUE</span>,</span>
+<span>         varname.size <span class="op">=</span> <span class="fl">4</span>,</span>
+<span>         varname.color <span class="op">=</span> <span class="st">"black"</span><span class="op">)</span> <span class="op">+</span></span>
+<span>  <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/labs.html">labs</a></span><span class="op">(</span>fill <span class="op">=</span> <span class="st">"Region"</span>, color <span class="op">=</span> <span class="st">"Region"</span><span class="op">)</span> <span class="op">+</span></span>
+<span>  <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/ggtheme.html">theme_minimal</a></span><span class="op">(</span>base_size <span class="op">=</span> <span class="fl">14</span><span class="op">)</span> <span class="op">+</span></span>
+<span>  <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/theme.html">theme</a></span><span class="op">(</span>legend.direction <span class="op">=</span> <span class="st">'horizontal'</span>, legend.position <span class="op">=</span> <span class="st">'top'</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+<div class="cell-output-display">
+<div id="fig-crime-biplot3" class="quarto-figure quarto-figure-center anchored">
+<figure class="figure"><p><img src="figs/fig-crime-biplot3-1.png" class="img-fluid figure-img" style="width:80.0%"></p>
+<figcaption class="figure-caption">Figure&nbsp;4.11: Biplot of dimensions 1 &amp; 3 of the crime data.</figcaption></figure>
 </div>
 </div>
 </div>
+<p>Dimension 3 in <a href="#fig-crime-biplot3">Figure&nbsp;<span>4.11</span></a> is more subtle. One interpretation is a contrast between larceny, which is a simple theft and robbery, which involves stealing something from a person and is considered a more serious crime with an element of possible violence. In this plot, murder has a relatively short variable vector, so does not contribute very much to differences among the states.</p>
 </section></section><section id="elliptical-insights-outlier-detection" class="level2" data-number="4.4"><h2 data-number="4.4" class="anchored" data-anchor-id="elliptical-insights-outlier-detection">
 <span class="header-section-number">4.4</span> Elliptical insights: Outlier detection</h2>
 <p>The data ellipse (<a href="03-multivariate_plots.html#sec-data-ellipse"><span>Section&nbsp;3.1.4</span></a>), or ellipsoid in more than 2D is fundamental in regression. But also, as Pearson showed, it is key to understanding principal components analysis, where the principal component directions are simply the axes of the ellipsoid of the data. As such, observations that are unusual in data space may not stand out in univariate views of the variables, but will stand out in principal component space, usually on the <em>smallest</em> dimension.</p>
 <p>As an illustration, I created a dataset of <span class="math inline">\(n = 100\)</span> observations with a linear relation, <span class="math inline">\(y = x + \mathcal{N}(0, 1)\)</span> and then added two discrepant points at (1.5, -1.5), (-1.5, 1.5).</p>
 <div class="cell" data-layout-align="center">
-<div class="sourceCode" id="cb17" data-source-line-numbers="nil" data-code-line-numbers="nil"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/Random.html">set.seed</a></span><span class="op">(</span><span class="fl">123345</span><span class="op">)</span></span>
+<div class="sourceCode" id="cb18" data-source-line-numbers="nil" data-code-line-numbers="nil"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/Random.html">set.seed</a></span><span class="op">(</span><span class="fl">123345</span><span class="op">)</span></span>
 <span><span class="va">x</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/stats/Normal.html">rnorm</a></span><span class="op">(</span><span class="fl">100</span><span class="op">)</span>,             <span class="fl">1.5</span>, <span class="op">-</span><span class="fl">1.5</span><span class="op">)</span></span>
 <span><span class="va">y</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="va">x</span><span class="op">[</span><span class="fl">1</span><span class="op">:</span><span class="fl">100</span><span class="op">]</span> <span class="op">+</span> <span class="fu"><a href="https://rdrr.io/r/stats/Normal.html">rnorm</a></span><span class="op">(</span><span class="fl">100</span><span class="op">)</span>, <span class="op">-</span><span class="fl">1.5</span>, <span class="fl">1.5</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
-<p>When these are plotted with a data ellipse in <a href="#fig-outlier-demo">Figure&nbsp;<span>4.11</span></a> (left), you can see the discrepant points labeled 101 and 102, but they do not stand out as unusual on either <span class="math inline">\(x\)</span> or <span class="math inline">\(y\)</span>. The transformation to from data space to principal components space, shown in <a href="#fig-outlier-demo">Figure&nbsp;<span>4.11</span></a> (right), is simply a rotation of <span class="math inline">\((x, y)\)</span> to a space whose coordinate axes are the major and minor axes of the data ellipse, <span class="math inline">\((PC_1, PC_2)\)</span>. In this view, the additional points appear a univariate outliers on the smallest dimension, <span class="math inline">\(PC_2\)</span>.</p>
+<p>When these are plotted with a data ellipse in <a href="#fig-outlier-demo">Figure&nbsp;<span>4.12</span></a> (left), you can see the discrepant points labeled 101 and 102, but they do not stand out as unusual on either <span class="math inline">\(x\)</span> or <span class="math inline">\(y\)</span>. The transformation to from data space to principal components space, shown in <a href="#fig-outlier-demo">Figure&nbsp;<span>4.12</span></a> (right), is simply a rotation of <span class="math inline">\((x, y)\)</span> to a space whose coordinate axes are the major and minor axes of the data ellipse, <span class="math inline">\((PC_1, PC_2)\)</span>. In this view, the additional points appear a univariate outliers on the smallest dimension, <span class="math inline">\(PC_2\)</span>.</p>
 <div class="cell" data-layout-align="center">
 <div class="cell-output-display">
 <div id="fig-outlier-demo" class="quarto-figure quarto-figure-center anchored">
 <figure class="figure"><p><img src="images/outlier-demo.png" class="img-fluid figure-img" style="width:100.0%"></p>
-<figcaption class="figure-caption">Figure&nbsp;4.11: <strong>Outlier demonstration</strong>: The left panel shows the original data and highlights the two discrepant points, which do not appear to be unusual on either x or y. The right panel shows the data rotated to principal components, where the labeled points stand out on the smallest PCA dimension.</figcaption></figure>
+<figcaption class="figure-caption">Figure&nbsp;4.12: <strong>Outlier demonstration</strong>: The left panel shows the original data and highlights the two discrepant points, which do not appear to be unusual on either x or y. The right panel shows the data rotated to principal components, where the labeled points stand out on the smallest PCA dimension.</figcaption></figure>
 </div>
 </div>
 </div>
-<p>To see this more clearly, <a href="#fig-outlier-animation">Figure&nbsp;<span>4.12</span></a> shows an animation of the rotation from data space to PCA space. This uses <code><a href="https://friendly.github.io/heplots/reference/interpPlot.html">heplots::interpPlot()</a></code> …</p>
+<p>To see this more clearly, <a href="#fig-outlier-animation">Figure&nbsp;<span>4.13</span></a> shows an animation of the rotation from data space to PCA space. This uses <code><a href="https://friendly.github.io/heplots/reference/interpPlot.html">heplots::interpPlot()</a></code> …</p>
 <div id="fig-outlier-animation" class="quarto-figure quarto-figure-center anchored">
 <figure class="figure"><div data-align="center">
 <p><iframe width="480" height="480" src="images/outlier-demo.gif"></iframe></p>
 </div>
-<figcaption class="figure-caption">Figure&nbsp;4.12: Animation of rotation from data space to PCA space.</figcaption></figure>
+<figcaption class="figure-caption">Figure&nbsp;4.13: Animation of rotation from data space to PCA space.</figcaption></figure>
 </div>
 
 
@@ -754,7 +797,7 @@ <h1 class="title"><span id="sec-pca-biplot" class="quarto-section-identifier"><s
 Friendly, Michael, and Howard Wainer. 2021. <em>A History of Data Visualization and Graphic Communication</em>. Cambridge, MA: Harvard University Press. <a href="https://doi.org/10.4159/9780674259034">https://doi.org/10.4159/9780674259034</a>.
 </div>
 <div id="ref-Gabriel:71" class="csl-entry" role="listitem">
-Gabriel, K. R. 1971. <span>“The Biplot Graphic Display of Matrices with Application to Principal Components Analysis.”</span> <em>Biometrics</em> 58 (3): 453–67.
+Gabriel, K. R. 1971. <span>“The Biplot Graphic Display of Matrices with Application to Principal Components Analysis.”</span> <em>Biometrics</em> 58 (3): 453–67. <a href="https://doi.org/10.2307/2334381">https://doi.org/10.2307/2334381</a>.
 </div>
 <div id="ref-Gabriel:81" class="csl-entry" role="listitem">
 ———. 1981. <span>“Biplot Display of Multivariate Matrices for Inspection of Data and Diagnosis.”</span> In <em>Interpreting Multivariate Data</em>, edited by V. Barnett, 147–73. London: John Wiley; Sons.
@@ -787,6 +830,7 @@ <h1 class="title"><span id="sec-pca-biplot" class="quarto-section-identifier"><s
 </section><section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes"><hr>
 <ol>
 <li id="fn1"><p>This is Euler’s <span class="citation" data-cites="Euler:1758">(<a href="90-references.html#ref-Euler:1758" role="doc-biblioref">1758</a>)</span> formula, which states that any convex polyheron must obey the formula <span class="math inline">\(V + F - E = 2\)</span> where <span class="math inline">\(V\)</span> is the number of vertexes (corners), <span class="math inline">\(F\)</span> is the number of faces and <span class="math inline">\(E\)</span> is the number of edges. For example, a tetrahedron or pyramid has <span class="math inline">\((V, F, E) = (4, 4, 6)\)</span> and a cube has <span class="math inline">\((V, F, E) = (8, 6, 12)\)</span>. Stated in words, for all solid bodies confined by planes, the sum of the number of vertexes and the number of faces is two less than the number of edges.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
+<li id="fn2"><p>For example, if two variables in the analysis are height and weight, changing the unit of height from inches to centimeters would multiply its’ variance by <span class="math inline">\(2.54^2\)</span>; changing weight from pounds to kilograms would divide its’ variance by <span class="math inline">\(2.2^2\)</span>.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
 </ol></section></main><!-- /main --><script id="quarto-html-after-body" type="application/javascript">
 window.document.addEventListener("DOMContentLoaded", function (event) {
   const toggleBodyColorMode = (bsSheetEl) => {
diff --git a/docs/90-references.html b/docs/90-references.html
index 2bd03b14..6f392e54 100644
--- a/docs/90-references.html
+++ b/docs/90-references.html
@@ -479,7 +479,7 @@ <h1 class="title">References</h1>
 <div id="ref-Gabriel:71" class="csl-entry" role="listitem">
 Gabriel, K. R. 1971. <span>“The Biplot Graphic Display of Matrices with
 Application to Principal Components Analysis.”</span>
-<em>Biometrics</em> 58 (3): 453–67.
+<em>Biometrics</em> 58 (3): 453–67. <a href="https://doi.org/10.2307/2334381">https://doi.org/10.2307/2334381</a>.
 </div>
 <div id="ref-Gabriel:81" class="csl-entry" role="listitem">
 ———. 1981. <span>“Biplot Display of Multivariate Matrices for Inspection
@@ -669,6 +669,12 @@ <h1 class="title">References</h1>
 <span>A</span> Tour of Statistical Software Design</em>. San Francisco,
 CA: No Starch Press.
 </div>
+<div id="ref-McGowan2023" class="csl-entry" role="listitem">
+McGowan, Lucy D’Agostino, Travis Gerke, and Malcolm Barrett. 2023.
+<span>“Causal Inference Is Not Just a Statistics Problem.”</span>
+<em>Journal of Statistics and Data Science Education</em>, December,
+1–9. <a href="https://doi.org/10.1080/26939169.2023.2276446">https://doi.org/10.1080/26939169.2023.2276446</a>.
+</div>
 <div id="ref-R-vcd" class="csl-entry" role="listitem">
 Meyer, David, Achim Zeileis, and Kurt Hornik. 2023. <em>Vcd: Visualizing
 Categorical Data</em>. <a href="https://CRAN.R-project.org/package=vcd">https://CRAN.R-project.org/package=vcd</a>.
diff --git a/docs/collinearity-ridge.html b/docs/collinearity-ridge.html
index cf37852f..891901d1 100644
--- a/docs/collinearity-ridge.html
+++ b/docs/collinearity-ridge.html
@@ -821,7 +821,7 @@ <h1 class="title"><span id="sec-collin" class="quarto-section-identifier"><span
 Friendly, Michael, and Ernest Kwan. 2009. <span>“Where’s <span>Waldo</span>: Visualizing Collinearity Diagnostics.”</span> <em>The American Statistician</em> 63 (1): 56–65. <a href="https://doi.org/10.1198/tast.2009.0012">https://doi.org/10.1198/tast.2009.0012</a>.
 </div>
 <div id="ref-Gabriel:71" class="csl-entry" role="listitem">
-Gabriel, K. R. 1971. <span>“The Biplot Graphic Display of Matrices with Application to Principal Components Analysis.”</span> <em>Biometrics</em> 58 (3): 453–67.
+Gabriel, K. R. 1971. <span>“The Biplot Graphic Display of Matrices with Application to Principal Components Analysis.”</span> <em>Biometrics</em> 58 (3): 453–67. <a href="https://doi.org/10.2307/2334381">https://doi.org/10.2307/2334381</a>.
 </div>
 <div id="ref-GowerHand:96" class="csl-entry" role="listitem">
 Gower, J. C., and D. J. Hand. 1996. <em>Biplots</em>. London: Chapman &amp; Hall.
diff --git a/docs/figs/fig-crime-biplot3-1.png b/docs/figs/fig-crime-biplot3-1.png
new file mode 100644
index 00000000..7fec9346
Binary files /dev/null and b/docs/figs/fig-crime-biplot3-1.png differ
diff --git a/docs/figs/fig-crime-ggscreeplot-1.png b/docs/figs/fig-crime-ggscreeplot-1.png
index 8a00e407..0de15a3b 100644
Binary files a/docs/figs/fig-crime-ggscreeplot-1.png and b/docs/figs/fig-crime-ggscreeplot-1.png differ
diff --git a/docs/index.html b/docs/index.html
index c6e4863e..2b6885ce 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -5,7 +5,7 @@
 <meta name="generator" content="quarto-1.3.450">
 <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
 <meta name="author" content="Michael Friendly">
-<meta name="dcterms.date" content="2023-12-03">
+<meta name="dcterms.date" content="2023-12-04">
 <title>Visualizing Multivariate Data and Models in R</title>
 <style>
 code{white-space: pre-wrap;}
@@ -273,7 +273,7 @@ <h1 class="title">Visualizing Multivariate Data and Models in R</h1>
     <div>
     <div class="quarto-title-meta-heading">Published</div>
     <div class="quarto-title-meta-contents">
-      <p class="date">December 3, 2023</p>
+      <p class="date">December 4, 2023</p>
     </div>
   </div>
   
diff --git a/docs/search.json b/docs/search.json
index 24f8e999..acdd3beb 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -81,7 +81,7 @@
     "href": "02-getting_started.html#sec-why_plot",
     "title": "2  Getting Started",
     "section": "\n2.1 Why plot your data?",
-    "text": "2.1 Why plot your data?\n\nGetting information from a table is like extracting sunlight from a cucumber. Farquhar and Farquhar (1891)\n\nAt the time the Farhquhar brothers wrote this pithy aphorism, graphical methods for understanding data had advanced considerably, but were not universally practiced, prompting their complaint.\nThe main graphic forms we use today—the pie chart, line graphs and bar—were invented by William Playfair around 1800 (Playfair 1786, 1801). The scatterplot arrived shortly after (Herschel 1833) and thematic maps showing the spatial distributions of social variables (crime, suicides, literacy) were used for the first time to reason about important societal questions (Guerry 1833) such as “is increased education associated with lower rates of crime?”\nIn the last half of the 18th Century, the idea of correlation was developed (Galton 1886; Pearson 1896) and the period, roughly 1860–1890, dubbed the “Golden Age of Graphics (Funkhouser 1937) became the richest period of innovation and beauty in the entire history of data visualization. During this time there was an incredible development of visual thinking, represented by the work of Charles Joseph Minard, advances in the role of visualization within scientific discovery, as illustrated through Francis Galton, and graphical excellence, embodied in state statistical atlases produced in France and elsewhere. See Friendly (2008); Friendly and Wainer (2021) for this history.\n\n2.1.1 Anscombe’s Quartet\nIn 1973, Francis Anscombe (Anscombe 1973) famously constructed a set of four datasets illustrate the importance of plotting the graphs before analyzing and model building, and the effect of unusual observations on fitted models. Now known as Anscombe’s Quartet, these datasets had identical statistical properties: the same means, standard devitions, correlations and regression lines.\nHis purpose was to debunk three notions that had been prevalent at the time:\n\nNumerical calculations are exact, but graphs are rough;\nFor any particular kind of statistical data there is just one set of calculations constituting a correct statistical analysis;\nPerforming intricate calculations is virtuous, whereas actually looking at the data is cheating.\n\nThe dataset datasets::anscombe has 11 observations, recorded in wide format, with variables x1:x4 and y1:y4. ::: {.cell layout-align=“center”}\ndata(anscombe) \nhead(anscombe)\n#&gt;   x1 x2 x3 x4   y1   y2    y3   y4\n#&gt; 1 10 10 10  8 8.04 9.14  7.46 6.58\n#&gt; 2  8  8  8  8 6.95 8.14  6.77 5.76\n#&gt; 3 13 13 13  8 7.58 8.74 12.74 7.71\n#&gt; 4  9  9  9  8 8.81 8.77  7.11 8.84\n#&gt; 5 11 11 11  8 8.33 9.26  7.81 8.47\n#&gt; 6 14 14 14  8 9.96 8.10  8.84 7.04\n:::\nThe following code transforms this data to long format and calculates some summary statistics for each dataset.\n\nanscombe_long &lt;- anscombe |&gt; \n  pivot_longer(everything(), \n               names_to = c(\".value\", \"dataset\"), \n               names_pattern = \"(.)(.)\"\n  ) |&gt;\n  arrange(dataset)\n\nanscombe_long |&gt;\n  group_by(dataset) |&gt;\n  summarise(xbar      = mean(x),\n            ybar      = mean(y),\n            r         = cor(x, y),\n            intercept = coef(lm(y ~ x))[1],\n            slope     = coef(lm(y ~ x))[2]\n         )\n#&gt; # A tibble: 4 × 6\n#&gt;   dataset  xbar  ybar     r intercept slope\n#&gt;   &lt;chr&gt;   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;\n#&gt; 1 1           9  7.50 0.816      3.00 0.500\n#&gt; 2 2           9  7.50 0.816      3.00 0.5  \n#&gt; 3 3           9  7.5  0.816      3.00 0.500\n#&gt; 4 4           9  7.50 0.817      3.00 0.500\n\nAs we can see, all four datasets have nearly identical univariate and bivariate statistical measures. You can only see how they differ in graphs, which show their true natures to be vastly different.\nFigure 2.1 is an enhanced version of Anscombe’s plot of these data, adding helpful annotations to show visually the underlying statistical summaries.\n\n\n\n\nFigure 2.1: Scatterplots of Anscombe’s Quartet. Each plot shows the fitted regression line and a 68% data ellipse representing the correlation between \\(x\\) and \\(y\\).\n\n\n\nThis figure is produced as follows, using a single call to ggplot(), faceted by dataset. As we will see later (Section 3.1.4), the data ellipse (produced by stat_ellipse()) reflects the correlation between the variables.\n\ndesc &lt;- tibble(\n  dataset = 1:4,\n  label = c(\"Pure error\", \"Lack of fit\", \"Outlier\", \"Influence\")\n)\n\nggplot(anscombe_long, aes(x = x, y = y)) +\n  geom_point(color = \"blue\", size = 4) +\n  geom_smooth(method = \"lm\", formula = y ~ x, se = FALSE,\n              color = \"red\", linewidth = 1.5) +\n  scale_x_continuous(breaks = seq(0,20,2)) +\n  scale_y_continuous(breaks = seq(0,12,2)) +\n  stat_ellipse(level = 0.5, color=col, type=\"norm\") +\n  geom_label(data=desc, aes(label = label), x=6, y=12) +\n  facet_wrap(~dataset, labeller = label_both) \n\nThe subplots are labeled with the statistical idea they reflect:\n\ndataset 1: Pure error. This is the typical case with well-behaved data. Variation of the points around the line reflect only measurement error or unreliability in the response, \\(y\\).\ndataset 2: Lack of fit. The data is clearly curvilinear, and would be very well described by a quadratic, y ~ poly(x, 2). This violates the assumption of linear regression that the fitted model has the correct form.\ndataset 3: Outlier. One point, second from the right, has a very large residual. Because this point is near the extreme of \\(x\\), it pulls the regression line towards it, as you can see by imagining a line through the remaining points.\ndataset 4: Influence. All but one of the points have the same \\(x\\) value. The one unusual point has sufficient influence to force the regression line to fit it exactly.\n\nOne moral from this example:\n\nLinear regression only “sees” a line. It does its’ best when the data are really curvilinear. Because the line is fit by least squares, it pulls the line toward discrepant points to minimize the sum of squared residuals.\n\n\n\n\n\n\n\nDatasaurus Dozen\n\n\n\nThe method Anscombe used to compose his quartet is unknown, but it turns out that that there is a method to construct a wider collection of datasets with identical statistical properties. After all, in a bivariate dataset with \\(n\\) observations, the correlation has \\((n-2)\\) degrees of freedom, so it is possible to choose this number of \\((x, y)\\) pairs to yield any given value. As it happens, it is possible to create any number of datasets with the same means, standard deviations and correlations with nearly any shape you like — even a dinosaur!\nThe Datasaurus Dozen was first publicized by Alberto Cairo in a blog post and are available in the datasauRus package Davies, Locke, and D’Agostino McGowan (2022). As shown in Figure 2.2, the sets include a star, cross, circle, bullseye, horizontal and vertical lines, and, of course the “dino”. The method (Matejka and Fitzmaurice 2017) uses simulated annealing, an iterative process that perturbs the points in a scatterplot, moving them towards a given shape while keeping the statistical summaries close to the fixed target value.\nThe datasauRus package just contains the datasets, but a general method, called statistical metamers, for producing such datasets has been described by Elio Campitelli and implemented in the metamer package.\n\n\n\n\n\n\nFigure 2.2: Animation of the Dinosaur Dozen datasets. Source: https://youtu.be/It4UA75z_KQ\n\n\n\n\n\n\n\n\n\nQuartets\n\n\n\nThe essential idea of a statistical “quartet” is to illustrate four quite different datasets or circumstances that seem superficially the same, but yet are paradoxically very different when you look behind the scenes. For example, in the context of causal analysis Gelman, Hullman, and Kennedy (2023), illustrated sets of four graphs, within each of which all four represent the same average (latent) causal effect but with much different patterns of individual effects. As an example of machine learning models, Biecek et al. (2023), introduced the “Rashamon Quartet”, a synthetic dataset for which four models from different classes (linear model, regression tree, random forest, neural network) have practically identical predictive performance. In all cases, the paradox is solved when their visualization reveals the distinct ways of understanding structure in the data. The quartets package contains these and other variations on this theme.\n\n\n\n2.1.2 A real example\nIn the mid 1980s, a consulting client had a strange problem. She was conducting a study of the relation between body image and weight preoccupation in exercising and non-exercising people (Davis 1990). As part of the design, the researcher wanted to know if self-reported weight could be taken as a reliable indicator of true weight measured on a scale. It was expected that the correlations between reported and measured weight should be close to 1.0, and the slope of the regression lines for men and women should also be close to 1.0. The dataset is car::Davis.\nShe was therefore very surprise to see the following numerical results: For men, the correlation was nearly perfect, but not so for women.\n\ndata(Davis, package=\"carData\")\nDavis &lt;- Davis |&gt;\n  drop_na()          # drop missing cases\nDavis |&gt;\n  group_by(sex) |&gt;\n  select(sex, weight, repwt) |&gt;\n  summarise(r = cor(weight, repwt))\n#&gt; # A tibble: 2 × 2\n#&gt;   sex       r\n#&gt;   &lt;fct&gt; &lt;dbl&gt;\n#&gt; 1 F     0.501\n#&gt; 2 M     0.979\n\nSimilarly, the regression lines showed the expected slope for men, but that for women was only 0.26.\n\nDavis |&gt;\n  nest(data = -sex) |&gt;\n  mutate(model = map(data, ~ lm(repwt ~ weight, data = .)),\n         tidied = map(model, tidy)) |&gt;\n  unnest(tidied) |&gt;\n  filter(term == \"weight\") |&gt;\n  select(sex, term, estimate, std.error)\n#&gt; # A tibble: 2 × 4\n#&gt;   sex   term   estimate std.error\n#&gt;   &lt;fct&gt; &lt;chr&gt;     &lt;dbl&gt;     &lt;dbl&gt;\n#&gt; 1 M     weight    0.990    0.0229\n#&gt; 2 F     weight    0.262    0.0459\n\nWhat could be wrong here?, the client asked. The consultant replied with the obvious question:\n\nDid you plot your data?\n\nThe answer turned out to be one discrepant point, a female, whose measured weight was 166 kg (366 lbs!). This single point exerted so much influence that it pulled the fitted regression line down to a slope of only 0.26.\n\nDavis |&gt;\n  ggplot(aes(x = weight, y = repwt, \n             color = sex, shape=sex)) +\n  geom_point(size = ifelse(Davis$weight==166, 6, 2)) +\n  labs(y = \"Measured weight (kg)\", \n       x = \"Reported weight (kg)\") +\n  geom_smooth(method = \"lm\", formula = y~x, se = FALSE) +\n  theme(legend.position = c(.8, .8))\n\n\n\nFigure 2.3: Regression for Davis’ data on reported weight and measures weight for men and women. Separate regression lines, predicting reported weight from measured weight are shown for males and females. One highly unusual point is highlighted.\n\n\n\nIn this example, it was arguable that \\(x\\) and \\(y\\) axes should be reversed, to determine how well measured weight can be predicted from reported weight. In ggplot this can easily be done by reversing the x and y aesthetics.\n\nDavis |&gt;\n  ggplot(aes(y = weight, x = repwt, color = sex, shape=sex)) +\n  geom_point(size = ifelse(Davis$weight==166, 6, 2)) +\n  labs(y = \"Measured weight (kg)\", x = \"Reported weight (kg)\") +\n  geom_smooth(method = \"lm\", formula = y~x, se = FALSE) +\n  theme(legend.position = c(.8, .8))\n\n\n\nFigure 2.4: Regression for Davis’ data on reported weight and measures weight for men and women. Separate regression lines, predicting measured weight from re[ported] weight are shown for males and females. The highly unusual point no longer has an effect on the fitted lines.\n\n\n\nIn Figure 2.4, this discrepant observation again stands out like a sore thumb, but it makes very little difference in the fitted line for females. The reason is that this point is well within the range of the \\(x\\) variable (repwt). To impact the slope of the regression line, an observation must be unusual in_both_ \\(x\\) and \\(y\\). We take up the topic of how to detect influential observations and what to do about them in Chapter XX.\nThe value of such plots is not only that they can reveal possible problems with an analysis, but also help identify their reasons and suggest corrective action. What went wrong here? Examination of the original data showed that this person switched the values, recording her reported weight in the box for measured weight and vice versa."
+    "text": "2.1 Why plot your data?\n\nGetting information from a table is like extracting sunlight from a cucumber. Farquhar and Farquhar (1891)\n\nAt the time the Farhquhar brothers wrote this pithy aphorism, graphical methods for understanding data had advanced considerably, but were not universally practiced, prompting their complaint.\nThe main graphic forms we use today—the pie chart, line graphs and bar—were invented by William Playfair around 1800 (Playfair 1786, 1801). The scatterplot arrived shortly after (Herschel 1833) and thematic maps showing the spatial distributions of social variables (crime, suicides, literacy) were used for the first time to reason about important societal questions (Guerry 1833) such as “is increased education associated with lower rates of crime?”\nIn the last half of the 18th Century, the idea of correlation was developed (Galton 1886; Pearson 1896) and the period, roughly 1860–1890, dubbed the “Golden Age of Graphics (Funkhouser 1937) became the richest period of innovation and beauty in the entire history of data visualization. During this time there was an incredible development of visual thinking, represented by the work of Charles Joseph Minard, advances in the role of visualization within scientific discovery, as illustrated through Francis Galton, and graphical excellence, embodied in state statistical atlases produced in France and elsewhere. See Friendly (2008); Friendly and Wainer (2021) for this history.\n\n2.1.1 Anscombe’s Quartet\nIn 1973, Francis Anscombe (Anscombe 1973) famously constructed a set of four datasets illustrate the importance of plotting the graphs before analyzing and model building, and the effect of unusual observations on fitted models. Now known as Anscombe’s Quartet, these datasets had identical statistical properties: the same means, standard devitions, correlations and regression lines.\nHis purpose was to debunk three notions that had been prevalent at the time:\n\nNumerical calculations are exact, but graphs are rough;\nFor any particular kind of statistical data there is just one set of calculations constituting a correct statistical analysis;\nPerforming intricate calculations is virtuous, whereas actually looking at the data is cheating.\n\nThe dataset datasets::anscombe has 11 observations, recorded in wide format, with variables x1:x4 and y1:y4. ::: {.cell layout-align=“center”}\ndata(anscombe) \nhead(anscombe)\n#&gt;   x1 x2 x3 x4   y1   y2    y3   y4\n#&gt; 1 10 10 10  8 8.04 9.14  7.46 6.58\n#&gt; 2  8  8  8  8 6.95 8.14  6.77 5.76\n#&gt; 3 13 13 13  8 7.58 8.74 12.74 7.71\n#&gt; 4  9  9  9  8 8.81 8.77  7.11 8.84\n#&gt; 5 11 11 11  8 8.33 9.26  7.81 8.47\n#&gt; 6 14 14 14  8 9.96 8.10  8.84 7.04\n:::\nThe following code transforms this data to long format and calculates some summary statistics for each dataset.\n\nanscombe_long &lt;- anscombe |&gt; \n  pivot_longer(everything(), \n               names_to = c(\".value\", \"dataset\"), \n               names_pattern = \"(.)(.)\"\n  ) |&gt;\n  arrange(dataset)\n\nanscombe_long |&gt;\n  group_by(dataset) |&gt;\n  summarise(xbar      = mean(x),\n            ybar      = mean(y),\n            r         = cor(x, y),\n            intercept = coef(lm(y ~ x))[1],\n            slope     = coef(lm(y ~ x))[2]\n         )\n#&gt; # A tibble: 4 × 6\n#&gt;   dataset  xbar  ybar     r intercept slope\n#&gt;   &lt;chr&gt;   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;\n#&gt; 1 1           9  7.50 0.816      3.00 0.500\n#&gt; 2 2           9  7.50 0.816      3.00 0.5  \n#&gt; 3 3           9  7.5  0.816      3.00 0.500\n#&gt; 4 4           9  7.50 0.817      3.00 0.500\n\nAs we can see, all four datasets have nearly identical univariate and bivariate statistical measures. You can only see how they differ in graphs, which show their true natures to be vastly different.\nFigure 2.1 is an enhanced version of Anscombe’s plot of these data, adding helpful annotations to show visually the underlying statistical summaries.\n\n\n\n\nFigure 2.1: Scatterplots of Anscombe’s Quartet. Each plot shows the fitted regression line and a 68% data ellipse representing the correlation between \\(x\\) and \\(y\\).\n\n\n\nThis figure is produced as follows, using a single call to ggplot(), faceted by dataset. As we will see later (Section 3.1.4), the data ellipse (produced by stat_ellipse()) reflects the correlation between the variables.\n\ndesc &lt;- tibble(\n  dataset = 1:4,\n  label = c(\"Pure error\", \"Lack of fit\", \"Outlier\", \"Influence\")\n)\n\nggplot(anscombe_long, aes(x = x, y = y)) +\n  geom_point(color = \"blue\", size = 4) +\n  geom_smooth(method = \"lm\", formula = y ~ x, se = FALSE,\n              color = \"red\", linewidth = 1.5) +\n  scale_x_continuous(breaks = seq(0,20,2)) +\n  scale_y_continuous(breaks = seq(0,12,2)) +\n  stat_ellipse(level = 0.5, color=col, type=\"norm\") +\n  geom_label(data=desc, aes(label = label), x=6, y=12) +\n  facet_wrap(~dataset, labeller = label_both) \n\nThe subplots are labeled with the statistical idea they reflect:\n\ndataset 1: Pure error. This is the typical case with well-behaved data. Variation of the points around the line reflect only measurement error or unreliability in the response, \\(y\\).\ndataset 2: Lack of fit. The data is clearly curvilinear, and would be very well described by a quadratic, y ~ poly(x, 2). This violates the assumption of linear regression that the fitted model has the correct form.\ndataset 3: Outlier. One point, second from the right, has a very large residual. Because this point is near the extreme of \\(x\\), it pulls the regression line towards it, as you can see by imagining a line through the remaining points.\ndataset 4: Influence. All but one of the points have the same \\(x\\) value. The one unusual point has sufficient influence to force the regression line to fit it exactly.\n\nOne moral from this example:\n\nLinear regression only “sees” a line. It does its’ best when the data are really curvilinear. Because the line is fit by least squares, it pulls the line toward discrepant points to minimize the sum of squared residuals.\n\n\n\n\n\n\n\nDatasaurus Dozen\n\n\n\nThe method Anscombe used to compose his quartet is unknown, but it turns out that that there is a method to construct a wider collection of datasets with identical statistical properties. After all, in a bivariate dataset with \\(n\\) observations, the correlation has \\((n-2)\\) degrees of freedom, so it is possible to choose this number of \\((x, y)\\) pairs to yield any given value. As it happens, it is possible to create any number of datasets with the same means, standard deviations and correlations with nearly any shape you like — even a dinosaur!\nThe Datasaurus Dozen was first publicized by Alberto Cairo in a blog post and are available in the datasauRus package Davies, Locke, and D’Agostino McGowan (2022). As shown in Figure 2.2, the sets include a star, cross, circle, bullseye, horizontal and vertical lines, and, of course the “dino”. The method (Matejka and Fitzmaurice 2017) uses simulated annealing, an iterative process that perturbs the points in a scatterplot, moving them towards a given shape while keeping the statistical summaries close to the fixed target value.\nThe datasauRus package just contains the datasets, but a general method, called statistical metamers, for producing such datasets has been described by Elio Campitelli and implemented in the metamer package.\n\n\n\n\n\n\nFigure 2.2: Animation of the Dinosaur Dozen datasets. Source: https://youtu.be/It4UA75z_KQ\n\n\n\n\n\n\n\n\n\nQuartets\n\n\n\nThe essential idea of a statistical “quartet” is to illustrate four quite different datasets or circumstances that seem superficially the same, but yet are paradoxically very different when you look behind the scenes. For example, in the context of causal analysis Gelman, Hullman, and Kennedy (2023), illustrated sets of four graphs, within each of which all four represent the same average (latent) causal effect but with much different patterns of individual effects; McGowan, Gerke, and Barrett (2023) provide another illustration with four seemingly identical data sets each generated by a different causal mechanism. As an example of machine learning models, Biecek et al. (2023), introduced the “Rashamon Quartet”, a synthetic dataset for which four models from different classes (linear model, regression tree, random forest, neural network) have practically identical predictive performance. In all cases, the paradox is solved when their visualization reveals the distinct ways of understanding structure in the data. The quartets package contains these and other variations on this theme.\n\n\n\n2.1.2 A real example\nIn the mid 1980s, a consulting client had a strange problem. She was conducting a study of the relation between body image and weight preoccupation in exercising and non-exercising people (Davis 1990). As part of the design, the researcher wanted to know if self-reported weight could be taken as a reliable indicator of true weight measured on a scale. It was expected that the correlations between reported and measured weight should be close to 1.0, and the slope of the regression lines for men and women should also be close to 1.0. The dataset is car::Davis.\nShe was therefore very surprise to see the following numerical results: For men, the correlation was nearly perfect, but not so for women.\n\ndata(Davis, package=\"carData\")\nDavis &lt;- Davis |&gt;\n  drop_na()          # drop missing cases\nDavis |&gt;\n  group_by(sex) |&gt;\n  select(sex, weight, repwt) |&gt;\n  summarise(r = cor(weight, repwt))\n#&gt; # A tibble: 2 × 2\n#&gt;   sex       r\n#&gt;   &lt;fct&gt; &lt;dbl&gt;\n#&gt; 1 F     0.501\n#&gt; 2 M     0.979\n\nSimilarly, the regression lines showed the expected slope for men, but that for women was only 0.26.\n\nDavis |&gt;\n  nest(data = -sex) |&gt;\n  mutate(model = map(data, ~ lm(repwt ~ weight, data = .)),\n         tidied = map(model, tidy)) |&gt;\n  unnest(tidied) |&gt;\n  filter(term == \"weight\") |&gt;\n  select(sex, term, estimate, std.error)\n#&gt; # A tibble: 2 × 4\n#&gt;   sex   term   estimate std.error\n#&gt;   &lt;fct&gt; &lt;chr&gt;     &lt;dbl&gt;     &lt;dbl&gt;\n#&gt; 1 M     weight    0.990    0.0229\n#&gt; 2 F     weight    0.262    0.0459\n\nWhat could be wrong here?, the client asked. The consultant replied with the obvious question:\n\nDid you plot your data?\n\nThe answer turned out to be one discrepant point, a female, whose measured weight was 166 kg (366 lbs!). This single point exerted so much influence that it pulled the fitted regression line down to a slope of only 0.26.\n\nDavis |&gt;\n  ggplot(aes(x = weight, y = repwt, \n             color = sex, shape=sex)) +\n  geom_point(size = ifelse(Davis$weight==166, 6, 2)) +\n  labs(y = \"Measured weight (kg)\", \n       x = \"Reported weight (kg)\") +\n  geom_smooth(method = \"lm\", formula = y~x, se = FALSE) +\n  theme(legend.position = c(.8, .8))\n\n\n\nFigure 2.3: Regression for Davis’ data on reported weight and measures weight for men and women. Separate regression lines, predicting reported weight from measured weight are shown for males and females. One highly unusual point is highlighted.\n\n\n\nIn this example, it was arguable that \\(x\\) and \\(y\\) axes should be reversed, to determine how well measured weight can be predicted from reported weight. In ggplot this can easily be done by reversing the x and y aesthetics.\n\nDavis |&gt;\n  ggplot(aes(y = weight, x = repwt, color = sex, shape=sex)) +\n  geom_point(size = ifelse(Davis$weight==166, 6, 2)) +\n  labs(y = \"Measured weight (kg)\", x = \"Reported weight (kg)\") +\n  geom_smooth(method = \"lm\", formula = y~x, se = FALSE) +\n  theme(legend.position = c(.8, .8))\n\n\n\nFigure 2.4: Regression for Davis’ data on reported weight and measures weight for men and women. Separate regression lines, predicting measured weight from re[ported] weight are shown for males and females. The highly unusual point no longer has an effect on the fitted lines.\n\n\n\nIn Figure 2.4, this discrepant observation again stands out like a sore thumb, but it makes very little difference in the fitted line for females. The reason is that this point is well within the range of the \\(x\\) variable (repwt). To impact the slope of the regression line, an observation must be unusual in_both_ \\(x\\) and \\(y\\). We take up the topic of how to detect influential observations and what to do about them in Chapter XX.\nThe value of such plots is not only that they can reveal possible problems with an analysis, but also help identify their reasons and suggest corrective action. What went wrong here? Examination of the original data showed that this person switched the values, recording her reported weight in the box for measured weight and vice versa."
   },
   {
     "objectID": "02-getting_started.html#plots-for-data-analysis",
@@ -116,7 +116,7 @@
     "href": "02-getting_started.html#principles-of-graphical-display",
     "title": "2  Getting Started",
     "section": "\n2.6 Principles of graphical display",
-    "text": "2.6 Principles of graphical display\n[This could be a separate chapter]\n\nCriteria for assessing graphs: communication goals\nEffective data display:\n\nMake the data stand out\nMake graphical comparison easy\nEffect ordering: For variables and unordered factors, arrange them according to the effects to be seen\n\n\nVisual thinning: As the data becomes more complex, focus more on impactful summaries\n\n\n\n\n\nAnscombe, F. J. 1973. “Graphs in Statistical Analysis.” The American Statistician 27: 17–21.\n\n\nBiecek, Przemyslaw, Hubert Baniecki, Mateusz Krzyzinski, and Dianne Cook. 2023. “Performance Is Not Enough: A Story of the Rashomon’s Quartet,” February. https://arxiv.org/abs/2302.13356.\n\n\nDavies, Rhian, Steph Locke, and Lucy D’Agostino McGowan. 2022. datasauRus: Datasets from the Datasaurus Dozen. https://CRAN.R-project.org/package=datasauRus.\n\n\nDavis, C. 1990. “Body Image and Weight Preoccupation: A Comparison Between Exercising and Non-Exercising Women.” Appetite 16 (1): 84. https://doi.org/10.1016/0195-6663(91)90115-9.\n\n\nFarquhar, A. B., and H. Farquhar. 1891. Economic and Industrial Delusions: A Discourse of the Case for Protection. New York: Putnam.\n\n\nFriendly, Michael. 2008. “The Golden Age of Statistical Graphics.” Statistical Science 23 (4): 502–35. https://doi.org/10.1214/08-STS268.\n\n\nFriendly, Michael, and Howard Wainer. 2021. A History of Data Visualization and Graphic Communication. Cambridge, MA: Harvard University Press. https://doi.org/10.4159/9780674259034.\n\n\nFunkhouser, H. Gray. 1937. “Historical Development of the Graphical Representation of Statistical Data.” Osiris 3 (1): 269–405. http://tinyurl.com/32ema9.\n\n\nGalton, Francis. 1886. “Regression Towards Mediocrity in Hereditary Stature.” Journal of the Anthropological Institute 15: 246–63. http://www.jstor.org/cgi-bin/jstor/viewitem/09595295/dm995266/99p0374f/0.\n\n\nGelman, Andrew, Jessica Hullman, and Lauren Kennedy. 2023. “Causal Quartets: Different Ways to Attain the Same Average Treatment Effect.” http://www.stat.columbia.edu/~gelman/research/unpublished/causal_quartets.pdf.\n\n\nGuerry, André-Michel. 1833. Essai Sur La Statistique Morale de La France. Paris: Crochard.\n\n\nHerschel, John F. W. 1833. “On the Investigation of the Orbits of Revolving Double Stars: Being a Supplement to a Paper Entitled \"Micrometrical Measures of 364 Double Stars\".” Memoirs of the Royal Astronomical Society 5: 171–222.\n\n\nMatejka, Justin, and George Fitzmaurice. 2017. “Same Stats, Different Graphs.” In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3025453.3025912.\n\n\nPearson, Karl. 1896. “Contributions to the Mathematical Theory of Evolution—III, Regression, Heredity and Panmixia.” Philosophical Transactions of the Royal Society of London, A, 187: 253–318.\n\n\nPlayfair, William. 1786. Commercial and Political Atlas: Representing, by Copper-Plate Charts, the Progress of the Commerce, Revenues, Expenditure, and Debts of England, During the Whole of the Eighteenth Century. London: Debrett; Robinson;; Sewell. http://ucpj.uchicago.edu/Isis/journal/demo/v000n000/000000/000000.fg4.html.\n\n\n———. 1801. Statistical Breviary; Shewing, on a Principle Entirely New, the Resources of Every State and Kingdom in Europe. London: Wallis."
+    "text": "2.6 Principles of graphical display\n[This could be a separate chapter]\n\nCriteria for assessing graphs: communication goals\nEffective data display:\n\nMake the data stand out\nMake graphical comparison easy\nEffect ordering: For variables and unordered factors, arrange them according to the effects to be seen\n\n\nVisual thinning: As the data becomes more complex, focus more on impactful summaries\n\n\n\n\n\nAnscombe, F. J. 1973. “Graphs in Statistical Analysis.” The American Statistician 27: 17–21.\n\n\nBiecek, Przemyslaw, Hubert Baniecki, Mateusz Krzyzinski, and Dianne Cook. 2023. “Performance Is Not Enough: A Story of the Rashomon’s Quartet,” February. https://arxiv.org/abs/2302.13356.\n\n\nDavies, Rhian, Steph Locke, and Lucy D’Agostino McGowan. 2022. datasauRus: Datasets from the Datasaurus Dozen. https://CRAN.R-project.org/package=datasauRus.\n\n\nDavis, C. 1990. “Body Image and Weight Preoccupation: A Comparison Between Exercising and Non-Exercising Women.” Appetite 16 (1): 84. https://doi.org/10.1016/0195-6663(91)90115-9.\n\n\nFarquhar, A. B., and H. Farquhar. 1891. Economic and Industrial Delusions: A Discourse of the Case for Protection. New York: Putnam.\n\n\nFriendly, Michael. 2008. “The Golden Age of Statistical Graphics.” Statistical Science 23 (4): 502–35. https://doi.org/10.1214/08-STS268.\n\n\nFriendly, Michael, and Howard Wainer. 2021. A History of Data Visualization and Graphic Communication. Cambridge, MA: Harvard University Press. https://doi.org/10.4159/9780674259034.\n\n\nFunkhouser, H. Gray. 1937. “Historical Development of the Graphical Representation of Statistical Data.” Osiris 3 (1): 269–405. http://tinyurl.com/32ema9.\n\n\nGalton, Francis. 1886. “Regression Towards Mediocrity in Hereditary Stature.” Journal of the Anthropological Institute 15: 246–63. http://www.jstor.org/cgi-bin/jstor/viewitem/09595295/dm995266/99p0374f/0.\n\n\nGelman, Andrew, Jessica Hullman, and Lauren Kennedy. 2023. “Causal Quartets: Different Ways to Attain the Same Average Treatment Effect.” http://www.stat.columbia.edu/~gelman/research/unpublished/causal_quartets.pdf.\n\n\nGuerry, André-Michel. 1833. Essai Sur La Statistique Morale de La France. Paris: Crochard.\n\n\nHerschel, John F. W. 1833. “On the Investigation of the Orbits of Revolving Double Stars: Being a Supplement to a Paper Entitled \"Micrometrical Measures of 364 Double Stars\".” Memoirs of the Royal Astronomical Society 5: 171–222.\n\n\nMatejka, Justin, and George Fitzmaurice. 2017. “Same Stats, Different Graphs.” In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3025453.3025912.\n\n\nMcGowan, Lucy D’Agostino, Travis Gerke, and Malcolm Barrett. 2023. “Causal Inference Is Not Just a Statistics Problem.” Journal of Statistics and Data Science Education, December, 1–9. https://doi.org/10.1080/26939169.2023.2276446.\n\n\nPearson, Karl. 1896. “Contributions to the Mathematical Theory of Evolution—III, Regression, Heredity and Panmixia.” Philosophical Transactions of the Royal Society of London, A, 187: 253–318.\n\n\nPlayfair, William. 1786. Commercial and Political Atlas: Representing, by Copper-Plate Charts, the Progress of the Commerce, Revenues, Expenditure, and Debts of England, During the Whole of the Eighteenth Century. London: Debrett; Robinson;; Sewell. http://ucpj.uchicago.edu/Isis/journal/demo/v000n000/000000/000000.fg4.html.\n\n\n———. 1801. Statistical Breviary; Shewing, on a Principle Entirely New, the Resources of Every State and Kingdom in Europe. London: Wallis."
   },
   {
     "objectID": "03-multivariate_plots.html#sec-bivariate_summaries",
@@ -165,28 +165,28 @@
     "href": "04-pca-biplot.html#sec-pca",
     "title": "4  PCA and Biplots",
     "section": "\n4.2 Principal components analysis",
-    "text": "4.2 Principal components analysis\nWhen Francis Galton (1886) first discovered the idea of regression toward the mean and presented his famous diagram (Figure 3.9), he had little thought that he had provided a window to a higher-dimensional world, beyond what even A Square could imagine. His friend, Karl Pearson (1896) took that idea and developed it into a theory of regression and a measure of correlation that would bear his name, Pearson’s \\(r\\).\nBut then Pearson (1901) had a further inspiration, akin to that of A Square. If he also had a cloud of sparkly points in \\(2, 3, 4, ..., p\\) dimensions, could he find a point (\\(0D\\)), or line (\\(1D\\)), or plane (\\(2D\\)), or even a hyperplane (\\(nD\\)) that best summarized — squeezed out the most juice—from multivariate data? This was the first trully multivariate problem in the history of statistics (Friendly and Wainer 2021, 186).\nThe best \\(0D\\) point was easy— it was simply the centroid, the means of each of the variables in the data, \\((\\bar{x}_1, \\bar{x}_2, ..., \\bar{x}_p)\\), because that was “closest” to the data in the sense of minimizing the sum of squared differences, \\(\\Sigma_i\\Sigma_j (x_{ij} - \\bar{x}_j)^2\\). In higher dimensions, his solution was also an application of the method of least squares, but he argued it geometrically and visually as shown in Figure 4.2.\n\n\n\n\nFigure 4.2: Karl Pearson’s (1901) geometric, visual argument for finding the line or plane of closest fit to a collection of points, P1, P2, P3, …\n\n\n\nFor a \\(1D\\) summary, the line of best fit to the points \\(P_1, P_2, \\dots P_n\\) is the line that goes through the centroid and made the average squared length of the perpendicular segments from those points to a line as small as possible. This was different from the case in linear regression, for fitting \\(y\\) from \\(x\\), where the average squared length of the vertical segments, \\(\\Sigma_i (y_i - \\hat{y}_i)^2\\) was minimized by least squares.\nHe went on to prove the visual insights from simple smoothing of Galton (1886) (shown in Figure 3.9) regarding the regression lines of y ~ x and x ~ y. More importantly, he proved that the cloud of points is captured, for the purpose of finding a best line, plane or hyperplane, by the ellipsoid that encloses it, as seen in his diagram, Figure 4.3. The major axis of the 2D ellipse is the line of best fit, along which the data points have the smallest average squared distance from the line. The axis at right angles to that—the minor axis— is labeled “line of worst fit” with the largest average squared distance.\n\n\n\n\nFigure 4.3: Karl Pearson’s diagram showing the elliptical geometry of regression and principal components analysis … Source: Pearson (1901), p. 566.\n\n\n\nEven more importantly— and this is the basis for what we call principal components analysis (PCA)— he recognized that the two orthogonal axes of the ellipse gave new coordinates for the data which were uncorrelated, whatever the correlation of \\(x\\) and \\(y\\).\n\nPhysically, the axes of the correlation type-ellipse are the directions of independent and uncorrelated variation. — Pearson (1901), p. 566.\n\nIt was but a small step to recognize that for two variables, \\(x\\) and \\(y\\):\n\nthe line of best fit, the major axis (PC1) had the greatest variance of points projected onto it;\nthe line of worst fit, the minor axis (PC2), had the least variance;\nthese could be seen as a rotation of the data space of \\((x, y)\\) to a new space (PC1, PC2) with uncorrelated variables;\nthe total variation of the points in data space, \\(\\text{Var}(x) + \\text{Var}(y)\\), being unchanged by rotation, was equally well expressed as the total variation \\(\\text{Var}(PC1) + \\text{Var}(PC2)\\) of the scores on what are now called the principal component axes.\n\nIt would have appealed to Pearson (and also to A Square) to see these observations demonstrated in a 3D video. Figure 4.4 shows a 3D plot of the variables Sepal.Length, Sepal.Width and Petal.Length in Edgar Anderson’s iris data, with points colored by species and the 95% data ellipsoid. This is rotated smoothly by interpolation until the first two principal axes, PC1 and PC2 are aligned with the horizontal and vertical dimensions. Because this is a rigid rotation of the cloud of points, the total variability is obviously unchanged.\n\n\n\n\n\nFigure 4.4: Animation of PCA as a rotation in 3D space. The plot shows three variables for the iris data, initially in data space and its’ data ellipsoid, with points colored according to species of the iris flowers. This is rotated smoothly until the first two principal axes are aligned with the horizontal and vertical dimensions.\n\n\n4.2.1 PCA by springs\nBefore delving into the mathematics of PCA, it is useful to see how Pearson’s problem, and fitting by least squares generally, could be solved in a physical realization.\nFrom elementary statistics, you may be familiar with a physical demonstration that the mean, \\(\\bar{x}\\), of a sample is the value for which the sum of deviations, \\(\\Sigma_i (x_i - \\bar{x})\\) is zero, so the mean can be visualized as the point of balance on a line where those differences \\((x_i - \\bar{x})\\) are placed. Equally well, there is a physical realization of the mean as the point along an axis where weights connected by springs will minimize the sum of squared differences, because springs with a constant stiffness, \\(k\\), exert forces proportional to \\(k (x_i - \\bar{x}) ^2\\). That’s the reason is useful as a measure of central tendency: it minimizes the average squared error.\nIn two dimensions, imagine that we have points, \\((x_i, y_i)\\) and these are attached by springs of equal stiffness \\(k\\), to a line anchored at the centroid, \\((\\bar{x}, \\bar{y})\\) as shown in Figure 4.5. If we rotate the line to some initial position and release it, the springs will pull the line clockwise or counterclockwise and the line will bounce around until the forces, proportional to the squares of the lengths of the springs, will eventually balance out at the position (shown by the red fixed line segments at the ends). This is the position that minimizes the the sum of squared lengths of the connecting springs, and also minimizes the kinetic energy in the system.\nIf you look closely at Figure 4.5 you will see something else: When the line is at its’ final position of minimum squared length and energy, the positions of the red points on this line are spread out furthest, i.e., have maximum variance. Conversely, when the line line is at right angles to its’ final position (shown by the black line at 90\\(^o\\)) the projected points have the smallest possible variance.\n\n\n\n\n\nFigure 4.5: Animation of PCA fitted by springs. The blue data points are connected to their projections on the red line by springs perpendicular to that line. From an initial position, the springs pull that line in proportion to their squared distances, until the line finally settles down to the position where the forces are balanced and the minimum is achieved. Source: Amoeba, https://bit.ly/46tAicu.\n\n\nTODO: Simple PCA example\n\n4.2.2 Mathematics and geometry of PCA\n\n4.2.3 Finding principal components\nIn R, principal components analysis is most easily carried out using stats::prcomp() or stats::princomp() or similar functions in other packages such as FactomineR::PCA(). The FactoMineR package (Husson et al. 2023) has extensive capabilities for exploratory analysis of multivariate data (PCA, correspondence analysis, cluster analysis, …).\nUnfortunately, although all of these performing similar calculations, the options for analysis and the details of the result they return differ …\nThe important options for analysis include:\n\nwhether or not the data variables are centered, to a mean of 0\nwhether or not the data variables are scaled, to a variance of 1.\n\nExample: Crime data\nThe dataset crime, analysed in Section 3.2.2, showed all positive correlations among the rates of various crimes in the corrgram, Figure 3.27. What can we see from a principal components analysis? Is it possible that a few dimensions can account for most of the juice in this data?\nIn this example, you can easily find the PCA solution using prcomp() in a single line in base-R. You need to specify the numeric variables to analyze by their columns in the data frame. The most important option here is scale. = TRUE …\n\ndata(crime, package = \"ggbiplot\")\ncrime.pca &lt;- prcomp(crime[, 2:7], scale. = TRUE)\n\nThe tidy equivalent is more verbose, but also more expressive about what is being done. It selects the variables to analyze by a function, is.numeric() applied to each of the columns and feeds the result to prcomp().\n\ncrime.pca &lt;- \n  crime |&gt; \n  dplyr::select(where(is.numeric)) |&gt;\n  prcomp(scale. = TRUE)\n\nAs is typical with models in R, the result, crime.pca of prcomp() is an object of class \"prcomp\", a list of components, and there are a variety of methods for \"prcomp\" objects. Among the simplest is summary(), which gives the contributions of each component to the total variance in the dataset.\n\nsummary(crime.pca) |&gt; print(digits=2)\n#&gt; Importance of components:\n#&gt;                         PC1  PC2  PC3   PC4   PC5   PC6   PC7\n#&gt; Standard deviation     2.03 1.11 0.85 0.563 0.508 0.471 0.352\n#&gt; Proportion of Variance 0.59 0.18 0.10 0.045 0.037 0.032 0.018\n#&gt; Cumulative Proportion  0.59 0.76 0.87 0.914 0.951 0.982 1.000\n\nThe object, crime.pca returned by prcomp() is a list of the following the following elements:\n\nnames(crime.pca)\n#&gt; [1] \"sdev\"     \"rotation\" \"center\"   \"scale\"    \"x\"\n\nOf these, for \\(n\\) observations and \\(p\\) variables,\n\n\nsdev is the length \\(p\\) vector of the standard deviations of the principal components (i.e., the square roots \\(\\sqrt{\\lambda_i}\\) of the eigenvalues of the covariance/correlation matrix);\n\nrotation is the \\(p \\times p\\) matrix of weights or loadings of the variables on the components; the columns are the eigenvectors of the covariance or correlation matrix of the data;\n\nx is the \\(n \\times p\\) matrix of scores for the observations on the components, the result of multiplying (rotating) the data matrix by the loadings. These are uncorrelated, so cov(x) is a \\(p \\times p\\) diagonal matrix whose diagonal elements are the eigenvalues \\(\\lambda_i\\) = sdev^2.\n\n4.2.4 Visualizing variance proportions: screeplots\nFor a high-D dataset, such as the crime data in seven dimensions, a natural question is how much of the variation in the data can be captured in 1D, 2D, 3D, … summaries and views. This is answered by considering the proportions of variance accounted by each of the dimensions, or their cumulative values. The components returned by various PCA methods have (confusingly) different names, so broom::tidy() provides methods to unify extraction of these values.\n\n(crime.eig &lt;- crime.pca |&gt; \n  broom::tidy(matrix = \"eigenvalues\"))\n#&gt; # A tibble: 7 × 4\n#&gt;      PC std.dev percent cumulative\n#&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;      &lt;dbl&gt;\n#&gt; 1     1   2.03   0.588       0.588\n#&gt; 2     2   1.11   0.177       0.765\n#&gt; 3     3   0.852  0.104       0.868\n#&gt; 4     4   0.563  0.0452      0.914\n#&gt; 5     5   0.508  0.0368      0.951\n#&gt; 6     6   0.471  0.0317      0.982\n#&gt; 7     7   0.352  0.0177      1\n\nThen, a simple visualization is a plot of the proportion of variance for each component (or cumulative proportion) against the component number, usually called a screeplot. The idea, introduced by Cattell (1966), is that after the largest, dominant components, the remainder should resemble the rubble, or scree formed by rocks falling from a cliff.\nFrom this plot, imagine drawing a straight line through the plotted eigenvalues, starting with the largest one. The typical rough guidance is that the last point to fall on this line represents the last component to extract, the idea being that beyond this, the amount of additional variance explained is non-meaningful. Another rule of thumb is to choose the number of components to extract a desired proportion of total variance, usually in the range of 80 - 90%.\nstats::plot(crime.pca) would give a bar plot of the variances of the components, however ggbiplot::ggscreeplot() gives nicer and more flexible displays as shown in Figure 4.6.\n\np1 &lt;- ggscreeplot(crime.pca) +\n  stat_smooth(data = crime.eig |&gt; filter(PC&gt;=4), \n              aes(x=PC, y=percent), method = \"lm\", \n              se = FALSE,\n              fullrange = TRUE) +\n  theme_bw(base_size = 14)\n\np2 &lt;- ggscreeplot(crime.pca, type = \"cev\") +\n  geom_hline(yintercept = c(0.8, 0.9), color = \"blue\") +\n  theme_bw(base_size = 14)\n\np1 + p2\n\n\n\nFigure 4.6: Screeplots for the PCA of the crime data. The left panel shows the traditional version, plotting variance proportions against component number, with linear guideline for the scree rule of thumb. The right panel plots cumulative proportions, showing cutoffs of 80%, 90%.\n\n\n\nFrom this we might conclude that four components are necessary to satisfy the scree criterion or to account for 90% of the total variation in these crime statistics. However two components, giving 76.5%, might be enough juice to tell a reasonable story.\n\n4.2.5 Visualizing PCA scores and variable vectors\nTo see and attempt to understand PCA results, it is useful to plot both the scores for the observations on a few of the largest components and also the loadings or variable vectors that give the weights for the variables in determining the principal components.\nIn Section 4.3 I discuss the biplot technique that plots both in a single display. However, I do this directly here, using tidy processing to explain what is going on in PCA and in these graphical displays.\nScores\nThe (uncorrelated) principal component scores can be extracted as crime.pca$x or using purrr::pluck(\"x\"). As noted above, these are uncorrelated and have variances equal to the eigenvalues of the correlation matrix.\n\nscores &lt;- crime.pca |&gt; purrr::pluck(\"x\") \ncov(scores) |&gt; zapsmall()\n#&gt;      PC1  PC2  PC3  PC4  PC5  PC6  PC7\n#&gt; PC1 4.11 0.00 0.00 0.00 0.00 0.00 0.00\n#&gt; PC2 0.00 1.24 0.00 0.00 0.00 0.00 0.00\n#&gt; PC3 0.00 0.00 0.73 0.00 0.00 0.00 0.00\n#&gt; PC4 0.00 0.00 0.00 0.32 0.00 0.00 0.00\n#&gt; PC5 0.00 0.00 0.00 0.00 0.26 0.00 0.00\n#&gt; PC6 0.00 0.00 0.00 0.00 0.00 0.22 0.00\n#&gt; PC7 0.00 0.00 0.00 0.00 0.00 0.00 0.12\n\nFor plotting, it is more convenient to use broom::augment() which extracts the scores (named .fittedPC*) and appends these to the variables in the dataset.\n\ncrime.pca |&gt;\n  broom::augment(crime) |&gt; head()\n#&gt; # A tibble: 6 × 18\n#&gt;   .rownames state      murder  rape robbery assault burglary larceny\n#&gt;   &lt;chr&gt;     &lt;chr&gt;       &lt;dbl&gt; &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;    &lt;dbl&gt;   &lt;dbl&gt;\n#&gt; 1 1         Alabama      14.2  25.2    96.8    278.    1136.   1882.\n#&gt; 2 2         Alaska       10.8  51.6    96.8    284     1332.   3370.\n#&gt; 3 3         Arizona       9.5  34.2   138.     312.    2346.   4467.\n#&gt; 4 4         Arkansas      8.8  27.6    83.2    203.     973.   1862.\n#&gt; 5 5         California   11.5  49.4   287      358     2139.   3500.\n#&gt; 6 6         Colorado      6.3  42     171.     293.    1935.   3903.\n#&gt; # ℹ 10 more variables: auto &lt;dbl&gt;, st &lt;chr&gt;, region &lt;fct&gt;,\n#&gt; #   .fittedPC1 &lt;dbl&gt;, .fittedPC2 &lt;dbl&gt;, .fittedPC3 &lt;dbl&gt;,\n#&gt; #   .fittedPC4 &lt;dbl&gt;, .fittedPC5 &lt;dbl&gt;, .fittedPC6 &lt;dbl&gt;,\n#&gt; #   .fittedPC7 &lt;dbl&gt;\n\nThen, we can use ggplot() to plot and pair of components. To aid interpretation, I label the points by their state abbreviation and color them by region of the U.S.. A geometric interpretation of the plot requires an aspect ratio of 1.0 (via coord_fixed()) so that a unit distance on the horizontal axis is the same length as a unit distance on the vertical. To demonstrate that the components are uncorrelated, I also added their data ellipse.\n\ncrime.pca |&gt;\n  broom::augment(crime) |&gt; # add original dataset back in\n  ggplot(aes(.fittedPC1, .fittedPC2, color = region)) + \n  geom_hline(yintercept = 0) +\n  geom_vline(xintercept = 0) +\n  geom_point(size = 1.5) +\n  geom_text(aes(label = st), nudge_x = 0.2) +\n  stat_ellipse(color = \"grey\") +\n  coord_fixed() +\n  labs(x = \"PC Dimension 1\", y = \"PC Dimnension 2\") +\n  theme_minimal(base_size = 14) +\n  theme(legend.position = \"top\") \n\n\n\nFigure 4.7: Plot of component scores on the first two principal components for the crime data. States are colored by region.\n\n\n\nTo interpret such plots, it is useful consider the observations that are a high and low on each of the axes as well as other information, such as region here, and ask how these differ on the crime statistics. The first component, PC1, contrasts Nevada and California with North Dakota, South Dakota and West Virginia. The second component has most of the southern states on the low end and Massachusetts, Rhode Island and Hawaii on the high end. However, interpretation is easier when we also consider how the various crimes contribute to these dimensions.\nWe could obviously go further and plot other pairs of components,\nTODO: Add plot of PC1 vs. PC3 #### Variable vectors {.unnumbered}\nYou can extract the variable loadings using either crime.pca$rotation or purrr::pluck(\"rotation\"), similar to what I did with the scores.\n\ncrime.pca |&gt; purrr::pluck(\"rotation\")\n#&gt;             PC1     PC2     PC3     PC4     PC5     PC6     PC7\n#&gt; murder   -0.300 -0.6292  0.1782 -0.2321  0.5381  0.2591  0.2676\n#&gt; rape     -0.432 -0.1694 -0.2442  0.0622  0.1885 -0.7733 -0.2965\n#&gt; robbery  -0.397  0.0422  0.4959 -0.5580 -0.5200 -0.1144 -0.0039\n#&gt; assault  -0.397 -0.3435 -0.0695  0.6298 -0.5067  0.1724  0.1917\n#&gt; burglary -0.440  0.2033 -0.2099 -0.0576  0.1010  0.5360 -0.6481\n#&gt; larceny  -0.357  0.4023 -0.5392 -0.2349  0.0301  0.0394  0.6017\n#&gt; auto     -0.295  0.5024  0.5684  0.4192  0.3698 -0.0573  0.1470\n\nBut note something important in this output: All of the weights for the first component are negative. In PCA, the directions of the eigenvectors are completely arbitrary, in the sense that the vector \\(-\\mathbf{v}_i\\) gives the same linear combination as \\(\\mathbf{v}_i\\), but with its’ sign reversed. For interpretation, it is useful (and usually recommended) to reflect the loadings to a positive orientation by multiplying them by -1.\nTo reflect the PCA loadings and get them into a convenient format for plotting with ggplot(), it is necessary to do a bit of processing, including making the row.names() into an explicit variable for the purpose of labeling.\n\nvectors &lt;- crime.pca |&gt; \n  purrr::pluck(\"rotation\") |&gt;\n  as.data.frame() |&gt;\n  mutate(PC1 = -1 * PC1, PC2 = -1 * PC2) |&gt;      # reflect axes\n  tibble::rownames_to_column(var = \"label\") \n\nvectors[, 1:3]\n#&gt;      label   PC1     PC2\n#&gt; 1   murder 0.300  0.6292\n#&gt; 2     rape 0.432  0.1694\n#&gt; 3  robbery 0.397 -0.0422\n#&gt; 4  assault 0.397  0.3435\n#&gt; 5 burglary 0.440 -0.2033\n#&gt; 6  larceny 0.357 -0.4023\n#&gt; 7     auto 0.295 -0.5024\n\nThen, I plot these using geom_segment(), taking some care to use arrows from the origin with a nice shape and add geom_text() labels for the variables positioned slightly to the right. Again, coord_fixed() ensures equal scales for the axes, which is important because we want to interpret the angles between the variable vectors and the PCA coordinate axes.\n\narrow_style &lt;- arrow(\n  angle = 20, ends = \"first\", type = \"closed\", \n  length = grid::unit(8, \"pt\")\n)\n\nvectors |&gt;\n  ggplot(aes(PC1, PC2)) +\n  geom_hline(yintercept = 0) +\n  geom_vline(xintercept = 0) +\n  geom_segment(xend = 0, yend = 0, \n               linewidth = 1, \n               arrow = arrow_style,\n               color = \"brown\") +\n  geom_text(aes(label = label), \n            size = 5,\n            hjust = \"outward\",\n            nudge_x = 0.05, \n            color = \"brown\") +\n  xlim(-0.4, 0.9) + \n  ylim(-0.8, 0.8) +\n  coord_fixed() + \n  theme_minimal(base_size = 14)\n\n\n\nFigure 4.8: Plot of component loadings the first two principal components for the crime data. These are interpreted as the contributions of the variables to the components.\n\n\n\nWhat is shown in Figure 4.8 has the following interpretations:\n\nthe lengths of the variable vectors, \\(||\\mathbf{v}_i|| = \\sqrt{\\Sigma_{j = 1:2} \\; v_{ij}^2}\\) give the proportion of variance of each variable accounted for in a two-dimensional display.\nthe value, \\(v_{ij}\\), of the vector for variable \\(\\mathbf{x}_i\\) on component \\(j\\) gives the correlation of that variable with the \\(j\\)th principal component.\nthe cosine of the angle between two variable vectors, \\(\\mathbf{v}_i\\) and \\(\\mathbf{v}_j\\) gives the approximation of the correlation between \\(\\mathbf{x}_i\\) and \\(\\mathbf{x}_j\\) that is shown in this space. This means that two variable vectors that point in the same direction are highly correlated, while variable vectors at right angles are approximately uncorrelated.\n\nTo illustrate point (1), the following indicates that almost 70% of the variance of murder is represented in the the 2D plot shown in Figure 4.7, but only 40% of the variance of robbery is captured.\n\nvectors |&gt; select(label, PC1, PC2) |&gt; \n  mutate(length = sqrt(PC1^2 + PC2^2))\n#&gt;      label   PC1     PC2 length\n#&gt; 1   murder 0.300  0.6292  0.697\n#&gt; 2     rape 0.432  0.1694  0.464\n#&gt; 3  robbery 0.397 -0.0422  0.399\n#&gt; 4  assault 0.397  0.3435  0.525\n#&gt; 5 burglary 0.440 -0.2033  0.485\n#&gt; 6  larceny 0.357 -0.4023  0.538\n#&gt; 7     auto 0.295 -0.5024  0.583"
+    "text": "4.2 Principal components analysis\nWhen Francis Galton (1886) first discovered the idea of regression toward the mean and presented his famous diagram (Figure 3.9), he had little thought that he had provided a window to a higher-dimensional world, beyond what even A Square could imagine. His friend, Karl Pearson (1896) took that idea and developed it into a theory of regression and a measure of correlation that would bear his name, Pearson’s \\(r\\).\nBut then Pearson (1901) had a further inspiration, akin to that of A Square. If he also had a cloud of sparkly points in \\(2, 3, 4, ..., p\\) dimensions, could he find a point (\\(0D\\)), or line (\\(1D\\)), or plane (\\(2D\\)), or even a hyperplane (\\(nD\\)) that best summarized — squeezed out the most juice—from multivariate data? This was the first trully multivariate problem in the history of statistics (Friendly and Wainer 2021, 186).\nThe best \\(0D\\) point was easy— it was simply the centroid, the means of each of the variables in the data, \\((\\bar{x}_1, \\bar{x}_2, ..., \\bar{x}_p)\\), because that was “closest” to the data in the sense of minimizing the sum of squared differences, \\(\\Sigma_i\\Sigma_j (x_{ij} - \\bar{x}_j)^2\\). In higher dimensions, his solution was also an application of the method of least squares, but he argued it geometrically and visually as shown in Figure 4.2.\n\n\n\n\nFigure 4.2: Karl Pearson’s (1901) geometric, visual argument for finding the line or plane of closest fit to a collection of points, P1, P2, P3, …\n\n\n\nFor a \\(1D\\) summary, the line of best fit to the points \\(P_1, P_2, \\dots P_n\\) is the line that goes through the centroid and made the average squared length of the perpendicular segments from those points to a line as small as possible. This was different from the case in linear regression, for fitting \\(y\\) from \\(x\\), where the average squared length of the vertical segments, \\(\\Sigma_i (y_i - \\hat{y}_i)^2\\) was minimized by least squares.\nHe went on to prove the visual insights from simple smoothing of Galton (1886) (shown in Figure 3.9) regarding the regression lines of y ~ x and x ~ y. More importantly, he proved that the cloud of points is captured, for the purpose of finding a best line, plane or hyperplane, by the ellipsoid that encloses it, as seen in his diagram, Figure 4.3. The major axis of the 2D ellipse is the line of best fit, along which the data points have the smallest average squared distance from the line. The axis at right angles to that—the minor axis— is labeled “line of worst fit” with the largest average squared distance.\n\n\n\n\nFigure 4.3: Karl Pearson’s diagram showing the elliptical geometry of regression and principal components analysis … Source: Pearson (1901), p. 566.\n\n\n\nEven more importantly— and this is the basis for what we call principal components analysis (PCA)— he recognized that the two orthogonal axes of the ellipse gave new coordinates for the data which were uncorrelated, whatever the correlation of \\(x\\) and \\(y\\).\n\nPhysically, the axes of the correlation type-ellipse are the directions of independent and uncorrelated variation. — Pearson (1901), p. 566.\n\nIt was but a small step to recognize that for two variables, \\(x\\) and \\(y\\):\n\nthe line of best fit, the major axis (PC1) had the greatest variance of points projected onto it;\nthe line of worst fit, the minor axis (PC2), had the least variance;\nthese could be seen as a rotation of the data space of \\((x, y)\\) to a new space (PC1, PC2) with uncorrelated variables;\nthe total variation of the points in data space, \\(\\text{Var}(x) + \\text{Var}(y)\\), being unchanged by rotation, was equally well expressed as the total variation \\(\\text{Var}(PC1) + \\text{Var}(PC2)\\) of the scores on what are now called the principal component axes.\n\nIt would have appealed to Pearson (and also to A Square) to see these observations demonstrated in a 3D video. Figure 4.4 shows a 3D plot of the variables Sepal.Length, Sepal.Width and Petal.Length in Edgar Anderson’s iris data, with points colored by species and the 95% data ellipsoid. This is rotated smoothly by interpolation until the first two principal axes, PC1 and PC2 are aligned with the horizontal and vertical dimensions. Because this is a rigid rotation of the cloud of points, the total variability is obviously unchanged.\n\n\n\n\n\n\n\n\n\n\n\nFigure 4.4: Animation of PCA as a rotation in 3D space. The plot shows three variables for the #| iris data, initially in data space and its’ data ellipsoid, with points colored according #| to species of the iris flowers. This is rotated smoothly until the first two principal axes #| are aligned with the horizontal and vertical dimensions.\n\n\n\n\n4.2.1 PCA by springs\nBefore delving into the mathematics of PCA, it is useful to see how Pearson’s problem, and fitting by least squares generally, could be solved in a physical realization.\nFrom elementary statistics, you may be familiar with a physical demonstration that the mean, \\(\\bar{x}\\), of a sample is the value for which the sum of deviations, \\(\\Sigma_i (x_i - \\bar{x})\\) is zero, so the mean can be visualized as the point of balance on a line where those differences \\((x_i - \\bar{x})\\) are placed. Equally well, there is a physical realization of the mean as the point along an axis where weights connected by springs will minimize the sum of squared differences, because springs with a constant stiffness, \\(k\\), exert forces proportional to \\(k (x_i - \\bar{x}) ^2\\). That’s the reason is useful as a measure of central tendency: it minimizes the average squared error.\nIn two dimensions, imagine that we have points, \\((x_i, y_i)\\) and these are attached by springs of equal stiffness \\(k\\), to a line anchored at the centroid, \\((\\bar{x}, \\bar{y})\\) as shown in Figure 4.5. If we rotate the line to some initial position and release it, the springs will pull the line clockwise or counterclockwise and the line will bounce around until the forces, proportional to the squares of the lengths of the springs, will eventually balance out at the position (shown by the red fixed line segments at the ends). This is the position that minimizes the the sum of squared lengths of the connecting springs, and also minimizes the kinetic energy in the system.\nIf you look closely at Figure 4.5 you will see something else: When the line is at its’ final position of minimum squared length and energy, the positions of the red points on this line are spread out furthest, i.e., have maximum variance. Conversely, when the line line is at right angles to its’ final position (shown by the black line at 90\\(^o\\)) the projected points have the smallest possible variance.\n\n\n\n\n\nFigure 4.5: Animation of PCA fitted by springs. The blue data points are connected to their projections on the red line by springs perpendicular to that line. From an initial position, the springs pull that line in proportion to their squared distances, until the line finally settles down to the position where the forces are balanced and the minimum is achieved. Source: Amoeba, https://bit.ly/46tAicu.\n\n\nTODO: Simple PCA example\n\n4.2.2 Mathematics and geometry of PCA\n\n4.2.3 Finding principal components\nIn R, principal components analysis is most easily carried out using stats::prcomp() or stats::princomp() or similar functions in other packages such as FactomineR::PCA(). The FactoMineR package (Husson et al. 2023) has extensive capabilities for exploratory analysis of multivariate data (PCA, correspondence analysis, cluster analysis, …).\nUnfortunately, although all of these performing similar calculations, the options for analysis and the details of the result they return differ.\nThe important options for analysis include:\n\nwhether or not the data variables are centered, to a mean of \\(\\bar{x}_j =0\\)\n\nwhether or not the data variables are scaled, to a variance of \\(\\text{Var}(x_j) =1\\).\n\nIt nearly always makes sense to center the variables. The choice of scaling determines whether the correlation matrix is analyzed, so that each variable contributes equally to the total variance that is to be accounted for versus analysis of the covariance matrix, where each variable contributes its own variance to the total. Analysis of the covariance matrix makes little sense when the variables are measured on different scales.2\nExample: Crime data\nThe dataset crime, analysed in Section 3.2.2, showed all positive correlations among the rates of various crimes in the corrgram, Figure 3.27. What can we see from a principal components analysis? Is it possible that a few dimensions can account for most of the juice in this data?\nIn this example, you can easily find the PCA solution using prcomp() in a single line in base-R. You need to specify the numeric variables to analyze by their columns in the data frame. The most important option here is scale. = TRUE …\n\ndata(crime, package = \"ggbiplot\")\ncrime.pca &lt;- prcomp(crime[, 2:7], scale. = TRUE)\n\nThe tidy equivalent is more verbose, but also more expressive about what is being done. It selects the variables to analyze by a function, is.numeric() applied to each of the columns and feeds the result to prcomp().\n\ncrime.pca &lt;- \n  crime |&gt; \n  dplyr::select(where(is.numeric)) |&gt;\n  prcomp(scale. = TRUE)\n\nAs is typical with models in R, the result, crime.pca of prcomp() is an object of class \"prcomp\", a list of components, and there are a variety of methods for \"prcomp\" objects. Among the simplest is summary(), which gives the contributions of each component to the total variance in the dataset.\n\nsummary(crime.pca) |&gt; print(digits=2)\n#&gt; Importance of components:\n#&gt;                         PC1  PC2  PC3   PC4   PC5   PC6   PC7\n#&gt; Standard deviation     2.03 1.11 0.85 0.563 0.508 0.471 0.352\n#&gt; Proportion of Variance 0.59 0.18 0.10 0.045 0.037 0.032 0.018\n#&gt; Cumulative Proportion  0.59 0.76 0.87 0.914 0.951 0.982 1.000\n\nThe object, crime.pca returned by prcomp() is a list of the following the following elements:\n\nnames(crime.pca)\n#&gt; [1] \"sdev\"     \"rotation\" \"center\"   \"scale\"    \"x\"\n\nOf these, for \\(n\\) observations and \\(p\\) variables,\n\n\nsdev is the length \\(p\\) vector of the standard deviations of the principal components (i.e., the square roots \\(\\sqrt{\\lambda_i}\\) of the eigenvalues of the covariance/correlation matrix);\n\nrotation is the \\(p \\times p\\) matrix of weights or loadings of the variables on the components; the columns are the eigenvectors of the covariance or correlation matrix of the data;\n\nx is the \\(n \\times p\\) matrix of scores for the observations on the components, the result of multiplying (rotating) the data matrix by the loadings. These are uncorrelated, so cov(x) is a \\(p \\times p\\) diagonal matrix whose diagonal elements are the eigenvalues \\(\\lambda_i\\) = sdev^2.\n\n4.2.4 Visualizing variance proportions: screeplots\nFor a high-D dataset, such as the crime data in seven dimensions, a natural question is how much of the variation in the data can be captured in 1D, 2D, 3D, … summaries and views. This is answered by considering the proportions of variance accounted by each of the dimensions, or their cumulative values. The components returned by various PCA methods have (confusingly) different names, so broom::tidy() provides methods to unify extraction of these values.\n\n(crime.eig &lt;- crime.pca |&gt; \n  broom::tidy(matrix = \"eigenvalues\"))\n#&gt; # A tibble: 7 × 4\n#&gt;      PC std.dev percent cumulative\n#&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;      &lt;dbl&gt;\n#&gt; 1     1   2.03   0.588       0.588\n#&gt; 2     2   1.11   0.177       0.765\n#&gt; 3     3   0.852  0.104       0.868\n#&gt; 4     4   0.563  0.0452      0.914\n#&gt; 5     5   0.508  0.0368      0.951\n#&gt; 6     6   0.471  0.0317      0.982\n#&gt; 7     7   0.352  0.0177      1\n\nThen, a simple visualization is a plot of the proportion of variance for each component (or cumulative proportion) against the component number, usually called a screeplot. The idea, introduced by Cattell (1966), is that after the largest, dominant components, the remainder should resemble the rubble, or scree formed by rocks falling from a cliff.\nFrom this plot, imagine drawing a straight line through the plotted eigenvalues, starting with the largest one. The typical rough guidance is that the last point to fall on this line represents the last component to extract, the idea being that beyond this, the amount of additional variance explained is non-meaningful. Another rule of thumb is to choose the number of components to extract a desired proportion of total variance, usually in the range of 80 - 90%.\nstats::plot(crime.pca) would give a bar plot of the variances of the components, however ggbiplot::ggscreeplot() gives nicer and more flexible displays as shown in Figure 4.6.\n\np1 &lt;- ggscreeplot(crime.pca) +\n  stat_smooth(data = crime.eig |&gt; filter(PC&gt;=4), \n              aes(x=PC, y=percent), method = \"lm\", \n              se = FALSE,\n              fullrange = TRUE) +\n  theme_bw(base_size = 14)\n\np2 &lt;- ggscreeplot(crime.pca, type = \"cev\") +\n  geom_hline(yintercept = c(0.8, 0.9), color = \"blue\") +\n  theme_bw(base_size = 14)\n\np1 + p2\n\n\n\nFigure 4.6: Screeplots for the PCA of the crime data. The left panel shows the traditional version, plotting variance proportions against component number, with linear guideline for the scree rule of thumb. The right panel plots cumulative proportions, showing cutoffs of 80%, 90%.\n\n\n\nFrom this we might conclude that four components are necessary to satisfy the scree criterion or to account for 90% of the total variation in these crime statistics. However two components, giving 76.5%, might be enough juice to tell a reasonable story.\n\n4.2.5 Visualizing PCA scores and variable vectors\nTo see and attempt to understand PCA results, it is useful to plot both the scores for the observations on a few of the largest components and also the loadings or variable vectors that give the weights for the variables in determining the principal components.\nIn Section 4.3 I discuss the biplot technique that plots both in a single display. However, I do this directly here, using tidy processing to explain what is going on in PCA and in these graphical displays.\nScores\nThe (uncorrelated) principal component scores can be extracted as crime.pca$x or using purrr::pluck(\"x\"). As noted above, these are uncorrelated and have variances equal to the eigenvalues of the correlation matrix.\n\nscores &lt;- crime.pca |&gt; purrr::pluck(\"x\") \ncov(scores) |&gt; zapsmall()\n#&gt;      PC1  PC2  PC3  PC4  PC5  PC6  PC7\n#&gt; PC1 4.11 0.00 0.00 0.00 0.00 0.00 0.00\n#&gt; PC2 0.00 1.24 0.00 0.00 0.00 0.00 0.00\n#&gt; PC3 0.00 0.00 0.73 0.00 0.00 0.00 0.00\n#&gt; PC4 0.00 0.00 0.00 0.32 0.00 0.00 0.00\n#&gt; PC5 0.00 0.00 0.00 0.00 0.26 0.00 0.00\n#&gt; PC6 0.00 0.00 0.00 0.00 0.00 0.22 0.00\n#&gt; PC7 0.00 0.00 0.00 0.00 0.00 0.00 0.12\n\nFor plotting, it is more convenient to use broom::augment() which extracts the scores (named .fittedPC*) and appends these to the variables in the dataset.\n\ncrime.pca |&gt;\n  broom::augment(crime) |&gt; head()\n#&gt; # A tibble: 6 × 18\n#&gt;   .rownames state      murder  rape robbery assault burglary larceny\n#&gt;   &lt;chr&gt;     &lt;chr&gt;       &lt;dbl&gt; &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;    &lt;dbl&gt;   &lt;dbl&gt;\n#&gt; 1 1         Alabama      14.2  25.2    96.8    278.    1136.   1882.\n#&gt; 2 2         Alaska       10.8  51.6    96.8    284     1332.   3370.\n#&gt; 3 3         Arizona       9.5  34.2   138.     312.    2346.   4467.\n#&gt; 4 4         Arkansas      8.8  27.6    83.2    203.     973.   1862.\n#&gt; 5 5         California   11.5  49.4   287      358     2139.   3500.\n#&gt; 6 6         Colorado      6.3  42     171.     293.    1935.   3903.\n#&gt; # ℹ 10 more variables: auto &lt;dbl&gt;, st &lt;chr&gt;, region &lt;fct&gt;,\n#&gt; #   .fittedPC1 &lt;dbl&gt;, .fittedPC2 &lt;dbl&gt;, .fittedPC3 &lt;dbl&gt;,\n#&gt; #   .fittedPC4 &lt;dbl&gt;, .fittedPC5 &lt;dbl&gt;, .fittedPC6 &lt;dbl&gt;,\n#&gt; #   .fittedPC7 &lt;dbl&gt;\n\nThen, we can use ggplot() to plot any pair of components. To aid interpretation, I label the points by their state abbreviation and color them by region of the U.S.. A geometric interpretation of the plot requires an aspect ratio of 1.0 (via coord_fixed()) so that a unit distance on the horizontal axis is the same length as a unit distance on the vertical. To demonstrate that the components are uncorrelated, I also added their data ellipse.\n\ncrime.pca |&gt;\n  broom::augment(crime) |&gt; # add original dataset back in\n  ggplot(aes(.fittedPC1, .fittedPC2, color = region)) + \n  geom_hline(yintercept = 0) +\n  geom_vline(xintercept = 0) +\n  geom_point(size = 1.5) +\n  geom_text(aes(label = st), nudge_x = 0.2) +\n  stat_ellipse(color = \"grey\") +\n  coord_fixed() +\n  labs(x = \"PC Dimension 1\", y = \"PC Dimnension 2\") +\n  theme_minimal(base_size = 14) +\n  theme(legend.position = \"top\") \n\n\n\nFigure 4.7: Plot of component scores on the first two principal components for the crime data. States are colored by region.\n\n\n\nTo interpret such plots, it is useful consider the observations that are a high and low on each of the axes as well as other information, such as region here, and ask how these differ on the crime statistics. The first component, PC1, contrasts Nevada and California with North Dakota, South Dakota and West Virginia. The second component has most of the southern states on the low end and Massachusetts, Rhode Island and Hawaii on the high end. However, interpretation is easier when we also consider how the various crimes contribute to these dimensions.\nWhen, as here, there are more than two components that seem important in the scree plot, we could obviously go further and plot other pairs.\nVariable vectors\nYou can extract the variable loadings using either crime.pca$rotation or purrr::pluck(\"rotation\"), similar to what I did with the scores.\n\ncrime.pca |&gt; purrr::pluck(\"rotation\")\n#&gt;             PC1     PC2     PC3     PC4     PC5     PC6     PC7\n#&gt; murder   -0.300 -0.6292  0.1782 -0.2321  0.5381  0.2591  0.2676\n#&gt; rape     -0.432 -0.1694 -0.2442  0.0622  0.1885 -0.7733 -0.2965\n#&gt; robbery  -0.397  0.0422  0.4959 -0.5580 -0.5200 -0.1144 -0.0039\n#&gt; assault  -0.397 -0.3435 -0.0695  0.6298 -0.5067  0.1724  0.1917\n#&gt; burglary -0.440  0.2033 -0.2099 -0.0576  0.1010  0.5360 -0.6481\n#&gt; larceny  -0.357  0.4023 -0.5392 -0.2349  0.0301  0.0394  0.6017\n#&gt; auto     -0.295  0.5024  0.5684  0.4192  0.3698 -0.0573  0.1470\n\nBut note something important in this output: All of the weights for the first component are negative. In PCA, the directions of the eigenvectors are completely arbitrary, in the sense that the vector \\(-\\mathbf{v}_i\\) gives the same linear combination as \\(\\mathbf{v}_i\\), but with its’ sign reversed. For interpretation, it is useful (and usually recommended) to reflect the loadings to a positive orientation by multiplying them by -1.\nTo reflect the PCA loadings and get them into a convenient format for plotting with ggplot(), it is necessary to do a bit of processing, including making the row.names() into an explicit variable for the purpose of labeling.\n\nvectors &lt;- crime.pca |&gt; \n  purrr::pluck(\"rotation\") |&gt;\n  as.data.frame() |&gt;\n  mutate(PC1 = -1 * PC1, PC2 = -1 * PC2) |&gt;      # reflect axes\n  tibble::rownames_to_column(var = \"label\") \n\nvectors[, 1:3]\n#&gt;      label   PC1     PC2\n#&gt; 1   murder 0.300  0.6292\n#&gt; 2     rape 0.432  0.1694\n#&gt; 3  robbery 0.397 -0.0422\n#&gt; 4  assault 0.397  0.3435\n#&gt; 5 burglary 0.440 -0.2033\n#&gt; 6  larceny 0.357 -0.4023\n#&gt; 7     auto 0.295 -0.5024\n\nThen, I plot these using geom_segment(), taking some care to use arrows from the origin with a nice shape and add geom_text() labels for the variables positioned slightly to the right. Again, coord_fixed() ensures equal scales for the axes, which is important because we want to interpret the angles between the variable vectors and the PCA coordinate axes.\n\narrow_style &lt;- arrow(\n  angle = 20, ends = \"first\", type = \"closed\", \n  length = grid::unit(8, \"pt\")\n)\n\nvectors |&gt;\n  ggplot(aes(PC1, PC2)) +\n  geom_hline(yintercept = 0) +\n  geom_vline(xintercept = 0) +\n  geom_segment(xend = 0, yend = 0, \n               linewidth = 1, \n               arrow = arrow_style,\n               color = \"brown\") +\n  geom_text(aes(label = label), \n            size = 5,\n            hjust = \"outward\",\n            nudge_x = 0.05, \n            color = \"brown\") +\n  xlim(-0.4, 0.9) + \n  ylim(-0.8, 0.8) +\n  coord_fixed() + \n  theme_minimal(base_size = 14)\n\n\n\nFigure 4.8: Plot of component loadings the first two principal components for the crime data. These are interpreted as the contributions of the variables to the components.\n\n\n\nWhat is shown in Figure 4.8 has the following interpretations:\n\nthe lengths of the variable vectors, \\(||\\mathbf{v}_i|| = \\sqrt{\\Sigma_{j = 1:2} \\; v_{ij}^2}\\) give the proportion of variance of each variable accounted for in a two-dimensional display.\nthe value, \\(v_{ij}\\), of the vector for variable \\(\\mathbf{x}_i\\) on component \\(j\\) gives the correlation of that variable with the \\(j\\)th principal component.\nthe cosine of the angle between two variable vectors, \\(\\mathbf{v}_i\\) and \\(\\mathbf{v}_j\\) gives the approximation of the correlation between \\(\\mathbf{x}_i\\) and \\(\\mathbf{x}_j\\) that is shown in this space. This means that two variable vectors that point in the same direction are highly correlated, while variable vectors at right angles are approximately uncorrelated.\n\nTo illustrate point (1), the following indicates that almost 70% of the variance of murder is represented in the the 2D plot shown in Figure 4.7, but only 40% of the variance of robbery is captured.\n\nvectors |&gt; select(label, PC1, PC2) |&gt; \n  mutate(length = sqrt(PC1^2 + PC2^2))\n#&gt;      label   PC1     PC2 length\n#&gt; 1   murder 0.300  0.6292  0.697\n#&gt; 2     rape 0.432  0.1694  0.464\n#&gt; 3  robbery 0.397 -0.0422  0.399\n#&gt; 4  assault 0.397  0.3435  0.525\n#&gt; 5 burglary 0.440 -0.2033  0.485\n#&gt; 6  larceny 0.357 -0.4023  0.538\n#&gt; 7     auto 0.295 -0.5024  0.583"
   },
   {
     "objectID": "04-pca-biplot.html#sec-biplot",
     "href": "04-pca-biplot.html#sec-biplot",
     "title": "4  PCA and Biplots",
     "section": "\n4.3 Biplots",
-    "text": "4.3 Biplots\nThe biplot is a simple and powerful idea that came from the recognition that you can overlay a plot of observation scores in a principal components analysis with the information of the variable loadings (weights) to give a simultaneous display that is easy to interpret. In this sense, a biplot is generalization of a scatterplot, projecting from data space to PCA space, where the observations are shown by points, as in the plots of component scores in Figure 4.7, but with the variables also shown by vectors (or scaled linear axes aligned with those vectors).\nThe idea of the biplot was introduced by Ruben Gabriel (1971, 1981) and later expanded in scope by Gower and Hand (1996). The book by Greenacre (2010) gives a practical overview of the many variety of biplots and Gower, Lubbe, and Roux (2011) provide a full treatment …\nBiplot methodolgy is far more general than I cover here. Categorical variables can be incorporated in PCA using category level points. Two-way frequency tables of categorical variables can be analysed using correspondence analysis, which is similar to PCA, but designed to account for the maximum amount of the \\(\\chi^2\\) statistic for association; multiple correspondence analysis extends this to method to multi-way tables (Friendly and Meyer 2016; Greenacre 1984).\n\n4.3.1 Constructing a biplot\nThe biplot is constructed by using the singular value decomposition (SVD) to obtain a low-rank approximation to the data matrix \\(\\mathbf{X}_{n \\times p}\\) (centered, and optionally scaled to unit variances) whose \\(n\\) rows are the observations and whose \\(p\\) columns are the variables.\n\n\n\n\nThe singular value decomposition expresses a data matrix X as the product of a matrix U of observation scores, a diagonal matrix Lambda of singular values and a matrix V of variable weights. TODO: Re-draw to fix notation.\n\n\n\nUsing the SVD, the matrix \\(\\mathbf{X}\\), of rank \\(r \\le p\\) can be expressed exactly as: \\[\n\\mathbf{X} = \\mathbf{U} \\mathbf{\\Lambda} \\mathbf{V}'\n                 = \\sum_i^r \\lambda_i \\mathbf{u}_i \\mathbf{v}_i' \\; ,\n\\]\nwhere\n\n\n\\(\\mathbf{U}\\) is an \\(n \\times r\\) orthonormal matrix of uncorrelated observation scores; these are also the eigenvectors of \\(\\mathbf{X} \\mathbf{X}'\\),\n\n\\(\\mathbf{\\Lambda}\\) is an \\(r \\times r\\) diagonal matrix of singular values, \\(\\lambda_1 \\ge \\lambda_2 \\ge \\cdots \\lambda_r\\), which are also the square roots of the eigenvalues of \\(\\mathbf{X} \\mathbf{X}'\\).\n\n\\(\\mathbf{V}\\) is an \\(r \\times p\\) orthonormal matrix of observation scores and also the eigenvectors of \\(\\mathbf{X}' \\mathbf{X}\\).\n\nThen, a rank 2 (or 3) PCA approximation \\(\\widehat{\\mathbf{X}}\\) to the data matrix used in the biplot can be obtained from the first 2 (or 3) singular values \\(\\lambda_i\\) and the corresponding \\(\\mathbf{u}_i, \\mathbf{v}_i\\) as:\n\\[\n\\mathbf{X} \\approx \\widehat{\\mathbf{X}} = \\lambda_1 \\mathbf{u}_1 \\mathbf{v}_1' + \\lambda_2 \\mathbf{u}_2 \\mathbf{v}_2' \\; .\n\\]\nThe variance of \\(\\mathbf{X}\\) accounted for by each term is \\(\\lambda_i^2\\).\nA biplot is then obtained by overlaying two scatterplots that share a common set of axes and have a between-set scalar product interpretation. Typically, the observations (rows of \\(\\mathbf{X}\\)) are represented as points and the variables (columns of \\(\\mathbf{X}\\)) are represented as vectors from the origin.\nThe factor, \\(\\alpha\\) allows the variances of the components to be apportioned between the row points and column vectors, with different interpretations, by representing the approximation \\(\\widehat{\\mathbf{X}}\\) as the product of two matrices,\n\\[\n\\widehat{\\mathbf{X}} = (\\mathbf{U} \\mathbf{\\Lambda}^\\alpha) (\\mathbf{\\Lambda}^{1-\\alpha} \\mathbf{V}') = \\mathbf{A} \\mathbf{B}'\n\\]\nThe choice \\(\\alpha = 1\\), assigning the singular values totally to the left factor, gives a distance interpretation to the row display and \\(\\alpha = 0\\) gives a distance interpretation to the column display. \\(\\alpha = 1/2\\) gives a symmetrically scaled biplot.\nWhen the singular values are assigned totally to the left or to the right factor, the resultant coordinates are called principal coordinates and the sum of squared coordinates on each dimension equal the corresponding singular value. The other matrix, to which no part of the singular values is assigned, contains the so-called standard coordinates and have sum of squared values equal to 1.0.\n\n4.3.2 Biplots in R\nThere are a large number of R packages providing biplots, …\nHere, I use the ggbiplot package …\n\n4.3.3 Example\nA basic biplot, using standardized principal components and labeling the observation by their state abbreviation is shown in Figure 4.9.\n\ncrime.pca &lt;- reflect(crime.pca) # reflect the axes\n\nggbiplot(crime.pca,\n   obs.scale = 1, var.scale = 1,\n   labels = crime$st ,\n   circle = TRUE,\n   varname.size = 4,\n   varname.color = \"brown\") +\n  theme_minimal(base_size = 14) \n\n\n\nFigure 4.9: Basic biplot of the crime data. …\n\n\n\n\nggbiplot(crime.pca,\n   obs.scale = 1, var.scale = 1,\n   groups = crime$region,\n   labels = crime$st,\n   labels.size = 4,\n   var.factor = 1.4,\n   ellipse = TRUE, ellipse.level = 0.5, ellipse.alpha = 0.1,\n   circle = TRUE,\n   varname.size = 4,\n   varname.color = \"black\") +\n  labs(fill = \"Region\", color = \"Region\") +\n  theme_minimal(base_size = 14) +\n  theme(legend.direction = 'horizontal', legend.position = 'top')\n\n\n\nFigure 4.10: Enhanced biplot of the crime data. …"
+    "text": "4.3 Biplots\nThe biplot is a simple and powerful idea that came from the recognition that you can overlay a plot of observation scores in a principal components analysis with the information of the variable loadings (weights) to give a simultaneous display that is easy to interpret. In this sense, a biplot is generalization of a scatterplot, projecting from data space to PCA space, where the observations are shown by points, as in the plots of component scores in Figure 4.7, but with the variables also shown by vectors (or scaled linear axes aligned with those vectors).\nThe idea of the biplot was introduced by Ruben Gabriel (1971, 1981) and later expanded in scope by Gower and Hand (1996). The book by Greenacre (2010) gives a practical overview of the many variety of biplots and Gower, Lubbe, and Roux (2011) provide a full treatment …\nBiplot methodolgy is far more general than I cover here. Categorical variables can be incorporated in PCA using category level points. Two-way frequency tables of categorical variables can be analysed using correspondence analysis, which is similar to PCA, but designed to account for the maximum amount of the \\(\\chi^2\\) statistic for association; multiple correspondence analysis extends this to method to multi-way tables (Friendly and Meyer 2016; Greenacre 1984).\n\n4.3.1 Constructing a biplot\nThe biplot is constructed by using the singular value decomposition (SVD) to obtain a low-rank approximation to the data matrix \\(\\mathbf{X}_{n \\times p}\\) (centered, and optionally scaled to unit variances) whose \\(n\\) rows are the observations and whose \\(p\\) columns are the variables.\n\n\n\n\nThe singular value decomposition expresses a data matrix X as the product of a matrix U of observation scores, a diagonal matrix Lambda of singular values and a matrix V of variable weights. TODO: Re-draw to fix notation.\n\n\n\nUsing the SVD, the matrix \\(\\mathbf{X}\\), of rank \\(r \\le p\\) can be expressed exactly as: \\[\n\\mathbf{X} = \\mathbf{U} \\mathbf{\\Lambda} \\mathbf{V}'\n                 = \\sum_i^r \\lambda_i \\mathbf{u}_i \\mathbf{v}_i' \\; ,\n\\]\nwhere\n\n\n\\(\\mathbf{U}\\) is an \\(n \\times r\\) orthonormal matrix of uncorrelated observation scores; these are also the eigenvectors of \\(\\mathbf{X} \\mathbf{X}'\\),\n\n\\(\\mathbf{\\Lambda}\\) is an \\(r \\times r\\) diagonal matrix of singular values, \\(\\lambda_1 \\ge \\lambda_2 \\ge \\cdots \\lambda_r\\), which are also the square roots of the eigenvalues of \\(\\mathbf{X} \\mathbf{X}'\\).\n\n\\(\\mathbf{V}\\) is an \\(r \\times p\\) orthonormal matrix of observation scores and also the eigenvectors of \\(\\mathbf{X}' \\mathbf{X}\\).\n\nThen, a rank 2 (or 3) PCA approximation \\(\\widehat{\\mathbf{X}}\\) to the data matrix used in the biplot can be obtained from the first 2 (or 3) singular values \\(\\lambda_i\\) and the corresponding \\(\\mathbf{u}_i, \\mathbf{v}_i\\) as:\n\\[\n\\mathbf{X} \\approx \\widehat{\\mathbf{X}} = \\lambda_1 \\mathbf{u}_1 \\mathbf{v}_1' + \\lambda_2 \\mathbf{u}_2 \\mathbf{v}_2' \\; .\n\\]\nThe variance of \\(\\mathbf{X}\\) accounted for by each term is \\(\\lambda_i^2\\).\nA biplot is then obtained by overlaying two scatterplots that share a common set of axes and have a between-set scalar product interpretation. Typically, the observations (rows of \\(\\mathbf{X}\\)) are represented as points and the variables (columns of \\(\\mathbf{X}\\)) are represented as vectors from the origin.\nThe factor, \\(\\alpha\\) allows the variances of the components to be apportioned between the row points and column vectors, with different interpretations, by representing the approximation \\(\\widehat{\\mathbf{X}}\\) as the product of two matrices,\n\\[\n\\widehat{\\mathbf{X}} = (\\mathbf{U} \\mathbf{\\Lambda}^\\alpha) (\\mathbf{\\Lambda}^{1-\\alpha} \\mathbf{V}') = \\mathbf{A} \\mathbf{B}'\n\\] This notation uses a little math trick involving a power, \\(0 \\le \\alpha \\le 1\\): When \\(\\alpha = 1\\), \\(\\mathbf{\\Lambda}^\\alpha = \\mathbf{\\Lambda}^1 =\\mathbf{\\Lambda}\\), and \\(\\mathbf{\\Lambda}^{1-\\alpha} = \\mathbf{\\Lambda}^0 =\\mathbf{I}\\). \\(\\alpha = 1/2\\) gives the diagonal matrix \\(\\mathbf{\\Lambda}^1/2\\) whose elements are the square roots of the singular values.\nThe choice \\(\\alpha = 1\\) assigns the singular values totally to the left factor; then, the angle between two variable vectors, reflecting the inner product \\(\\mathbf{x}_j^T, \\mathbf{x}_{j'}\\) approximates their correlation or covariance, and the distance between the points approximates their Mahalanobis distances. \\(\\alpha = 0\\) gives a distance interpretation to the column display. \\(\\alpha = 1/2\\) gives a symmetrically scaled biplot. *TODO**: Explain this better.\nWhen the singular values are assigned totally to the left or to the right factor, the resultant coordinates are called principal coordinates and the sum of squared coordinates on each dimension equal the corresponding singular value. The other matrix, to which no part of the singular values is assigned, contains the so-called standard coordinates and have sum of squared values equal to 1.0.\n\n4.3.2 Biplots in R\nThere are a large number of R packages providing biplots. The most basic, stats::biplot(), provides methods for \"prcomp\" and \"princomp\" objects.\nTODO: Mention factoextra package, fviz(), fviz_pca_biplot(), … giving ggplot2 graphics. Also mention adegraphics package\nHere, I use the ggbiplot package, which aims to provide a simple interface to biplots within the ggplot2 framework.\n\n4.3.3 Example\nA basic biplot of the crime data, using standardized principal components and labeling the observation by their state abbreviation is shown in Figure 4.9. The correlation circle indicates that these components are uncorrelated and have equal variance in the display.\n\ncrime.pca &lt;- reflect(crime.pca) # reflect the axes\n\nggbiplot(crime.pca,\n   obs.scale = 1, var.scale = 1,\n   labels = crime$st ,\n   circle = TRUE,\n   varname.size = 4,\n   varname.color = \"brown\") +\n  theme_minimal(base_size = 14) \n\n\n\nFigure 4.9: Basic biplot of the crime data. …\n\n\n\nIn this dataset the states are grouped by region and we saw some differences among regions in the plot (Figure 4.7) of component scores. ggbiplot() provides options to include a groups = variable, used to color the observation points and also to draw their data ellipses, facilitating interpretation.\n\nggbiplot(crime.pca,\n   obs.scale = 1, var.scale = 1,\n   groups = crime$region,\n   labels = crime$st,\n   labels.size = 4,\n   var.factor = 1.4,\n   ellipse = TRUE, ellipse.level = 0.5, ellipse.alpha = 0.1,\n   circle = TRUE,\n   varname.size = 4,\n   varname.color = \"black\") +\n  labs(fill = \"Region\", color = \"Region\") +\n  theme_minimal(base_size = 14) +\n  theme(legend.direction = 'horizontal', legend.position = 'top')\n\n\n\nFigure 4.10: Enhanced biplot of the crime data, grouping the states by region and adding data ellipses.\n\n\n\nThis plot provides what is necessary to interpret the nature of the components and also the variation of the states in relation to these. In this, the data ellipses for the regions provide a visual summary that aids interpretation.\n\nFrom the variable vectors, it seems that PC1, having all positive and nearly equal loadings, reflects a total or overall index of crimes. Nevada, California, New York and Florida are highest on this, while North Dakota, South Dakota and West Virginia are lowest.\nThe second component, PC2, shows a contrast between crimes against persons (murder, assault, rape) at the top and property crimes (auto theft, larceny) at the bottom. Nearly all the Southern states are high on personal crimes; states in the North East are generally higher on property crimes.\nWestern states tend to be somewhat higher on overall crime rate, while North Central are lower on average. In these states there is not much variation in the relative proportions of personal vs. property crimes.\n\nMoreover, in this biplot you can interpret the the value for a particular state on a given crime by considering its projection on the variable vector, where the origin corresponds to the mean, positions along the vector have greater than average values on that crime, and the opposite direction have lower than average values. For example, Massachusetts has the highest value on auto theft, but a value less than the mean. Louisiana and South Carolina on the other hand are highest in the rate of murder and slightly less than average on auto theft.\nThese 2D plots account for only 76.5% of the total variance of crimes, so it is useful to also examine the third principal component, which accounts for an additional 10.4%. The choices = option controls which dimensions are plotted.\n\nggbiplot(crime.pca,\n         choices = c(1,3),\n         obs.scale = 1, var.scale = 1,\n         groups = crime$region,\n         labels = crime$st,\n         labels.size = 4,\n         var.factor = 2,\n         ellipse = TRUE, ellipse.level = 0.5, ellipse.alpha = 0.1,\n         circle = TRUE,\n         varname.size = 4,\n         varname.color = \"black\") +\n  labs(fill = \"Region\", color = \"Region\") +\n  theme_minimal(base_size = 14) +\n  theme(legend.direction = 'horizontal', legend.position = 'top')\n\n\n\nFigure 4.11: Biplot of dimensions 1 & 3 of the crime data.\n\n\n\nDimension 3 in Figure 4.11 is more subtle. One interpretation is a contrast between larceny, which is a simple theft and robbery, which involves stealing something from a person and is considered a more serious crime with an element of possible violence. In this plot, murder has a relatively short variable vector, so does not contribute very much to differences among the states."
   },
   {
     "objectID": "04-pca-biplot.html#elliptical-insights-outlier-detection",
     "href": "04-pca-biplot.html#elliptical-insights-outlier-detection",
     "title": "4  PCA and Biplots",
     "section": "\n4.4 Elliptical insights: Outlier detection",
-    "text": "4.4 Elliptical insights: Outlier detection\nThe data ellipse (Section 3.1.4), or ellipsoid in more than 2D is fundamental in regression. But also, as Pearson showed, it is key to understanding principal components analysis, where the principal component directions are simply the axes of the ellipsoid of the data. As such, observations that are unusual in data space may not stand out in univariate views of the variables, but will stand out in principal component space, usually on the smallest dimension.\nAs an illustration, I created a dataset of \\(n = 100\\) observations with a linear relation, \\(y = x + \\mathcal{N}(0, 1)\\) and then added two discrepant points at (1.5, -1.5), (-1.5, 1.5).\n\nset.seed(123345)\nx &lt;- c(rnorm(100),             1.5, -1.5)\ny &lt;- c(x[1:100] + rnorm(100), -1.5, 1.5)\n\nWhen these are plotted with a data ellipse in Figure 4.11 (left), you can see the discrepant points labeled 101 and 102, but they do not stand out as unusual on either \\(x\\) or \\(y\\). The transformation to from data space to principal components space, shown in Figure 4.11 (right), is simply a rotation of \\((x, y)\\) to a space whose coordinate axes are the major and minor axes of the data ellipse, \\((PC_1, PC_2)\\). In this view, the additional points appear a univariate outliers on the smallest dimension, \\(PC_2\\).\n\n\n\n\nFigure 4.11: Outlier demonstration: The left panel shows the original data and highlights the two discrepant points, which do not appear to be unusual on either x or y. The right panel shows the data rotated to principal components, where the labeled points stand out on the smallest PCA dimension.\n\n\n\nTo see this more clearly, Figure 4.12 shows an animation of the rotation from data space to PCA space. This uses heplots::interpPlot() …\n\n\n\n\nFigure 4.12: Animation of rotation from data space to PCA space.\n\n\n\n\n\nCattell, Raymond B. 1966. “The Scree Test for the Number of Factors.” Multivariate Behavioral Research 1 (2): 245–76. https://doi.org/10.1207/s15327906mbr0102_10.\n\n\nEuler, Leonhard. 1758. “Elementa Doctrinae Solidorum.” Novi Commentarii Academiae Scientiarum Petropolitanae 4: 109–40. https://scholarlycommons.pacific.edu/euler-works/230/.\n\n\nFriendly, Michael, and David Meyer. 2016. Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data. Boca Raton, FL: Chapman & Hall/CRC.\n\n\nFriendly, Michael, Georges Monette, and John Fox. 2013. “Elliptical Insights: Understanding Statistical Methods Through Elliptical Geometry.” Statistical Science 28 (1): 1–39. https://doi.org/10.1214/12-STS402.\n\n\nFriendly, Michael, and Howard Wainer. 2021. A History of Data Visualization and Graphic Communication. Cambridge, MA: Harvard University Press. https://doi.org/10.4159/9780674259034.\n\n\nGabriel, K. R. 1971. “The Biplot Graphic Display of Matrices with Application to Principal Components Analysis.” Biometrics 58 (3): 453–67.\n\n\n———. 1981. “Biplot Display of Multivariate Matrices for Inspection of Data and Diagnosis.” In Interpreting Multivariate Data, edited by V. Barnett, 147–73. London: John Wiley; Sons.\n\n\nGalton, Francis. 1886. “Regression Towards Mediocrity in Hereditary Stature.” Journal of the Anthropological Institute 15: 246–63. http://www.jstor.org/cgi-bin/jstor/viewitem/09595295/dm995266/99p0374f/0.\n\n\nGower, J. C., and D. J. Hand. 1996. Biplots. London: Chapman & Hall.\n\n\nGower, J. C., S. G. Lubbe, and N. J. L. Roux. 2011. Understanding Biplots. Wiley. http://books.google.ca/books?id=66gQCi5JOKYC.\n\n\nGreenacre, Michael. 1984. Theory and Applications of Correspondence Analysis. London: Academic Press.\n\n\n———. 2010. Biplots in Practice. Fundación BBVA. https://books.google.ca/books?id=dv4LrFP7U\\_EC.\n\n\nHusson, Francois, Julie Josse, Sebastien Le, and Jeremy Mazet. 2023. FactoMineR: Multivariate Exploratory Data Analysis and Data Mining. http://factominer.free.fr.\n\n\nPearson, Karl. 1896. “Contributions to the Mathematical Theory of Evolution—III, Regression, Heredity and Panmixia.” Philosophical Transactions of the Royal Society of London, A, 187: 253–318.\n\n\n———. 1901. “On Lines and Planes of Closest Fit to Systems of Points in Space.” Philosophical Magazine 6 (2): 559–72."
+    "text": "4.4 Elliptical insights: Outlier detection\nThe data ellipse (Section 3.1.4), or ellipsoid in more than 2D is fundamental in regression. But also, as Pearson showed, it is key to understanding principal components analysis, where the principal component directions are simply the axes of the ellipsoid of the data. As such, observations that are unusual in data space may not stand out in univariate views of the variables, but will stand out in principal component space, usually on the smallest dimension.\nAs an illustration, I created a dataset of \\(n = 100\\) observations with a linear relation, \\(y = x + \\mathcal{N}(0, 1)\\) and then added two discrepant points at (1.5, -1.5), (-1.5, 1.5).\n\nset.seed(123345)\nx &lt;- c(rnorm(100),             1.5, -1.5)\ny &lt;- c(x[1:100] + rnorm(100), -1.5, 1.5)\n\nWhen these are plotted with a data ellipse in Figure 4.12 (left), you can see the discrepant points labeled 101 and 102, but they do not stand out as unusual on either \\(x\\) or \\(y\\). The transformation to from data space to principal components space, shown in Figure 4.12 (right), is simply a rotation of \\((x, y)\\) to a space whose coordinate axes are the major and minor axes of the data ellipse, \\((PC_1, PC_2)\\). In this view, the additional points appear a univariate outliers on the smallest dimension, \\(PC_2\\).\n\n\n\n\nFigure 4.12: Outlier demonstration: The left panel shows the original data and highlights the two discrepant points, which do not appear to be unusual on either x or y. The right panel shows the data rotated to principal components, where the labeled points stand out on the smallest PCA dimension.\n\n\n\nTo see this more clearly, Figure 4.13 shows an animation of the rotation from data space to PCA space. This uses heplots::interpPlot() …\n\n\n\n\nFigure 4.13: Animation of rotation from data space to PCA space.\n\n\n\n\n\nCattell, Raymond B. 1966. “The Scree Test for the Number of Factors.” Multivariate Behavioral Research 1 (2): 245–76. https://doi.org/10.1207/s15327906mbr0102_10.\n\n\nEuler, Leonhard. 1758. “Elementa Doctrinae Solidorum.” Novi Commentarii Academiae Scientiarum Petropolitanae 4: 109–40. https://scholarlycommons.pacific.edu/euler-works/230/.\n\n\nFriendly, Michael, and David Meyer. 2016. Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data. Boca Raton, FL: Chapman & Hall/CRC.\n\n\nFriendly, Michael, Georges Monette, and John Fox. 2013. “Elliptical Insights: Understanding Statistical Methods Through Elliptical Geometry.” Statistical Science 28 (1): 1–39. https://doi.org/10.1214/12-STS402.\n\n\nFriendly, Michael, and Howard Wainer. 2021. A History of Data Visualization and Graphic Communication. Cambridge, MA: Harvard University Press. https://doi.org/10.4159/9780674259034.\n\n\nGabriel, K. R. 1971. “The Biplot Graphic Display of Matrices with Application to Principal Components Analysis.” Biometrics 58 (3): 453–67. https://doi.org/10.2307/2334381.\n\n\n———. 1981. “Biplot Display of Multivariate Matrices for Inspection of Data and Diagnosis.” In Interpreting Multivariate Data, edited by V. Barnett, 147–73. London: John Wiley; Sons.\n\n\nGalton, Francis. 1886. “Regression Towards Mediocrity in Hereditary Stature.” Journal of the Anthropological Institute 15: 246–63. http://www.jstor.org/cgi-bin/jstor/viewitem/09595295/dm995266/99p0374f/0.\n\n\nGower, J. C., and D. J. Hand. 1996. Biplots. London: Chapman & Hall.\n\n\nGower, J. C., S. G. Lubbe, and N. J. L. Roux. 2011. Understanding Biplots. Wiley. http://books.google.ca/books?id=66gQCi5JOKYC.\n\n\nGreenacre, Michael. 1984. Theory and Applications of Correspondence Analysis. London: Academic Press.\n\n\n———. 2010. Biplots in Practice. Fundación BBVA. https://books.google.ca/books?id=dv4LrFP7U\\_EC.\n\n\nHusson, Francois, Julie Josse, Sebastien Le, and Jeremy Mazet. 2023. FactoMineR: Multivariate Exploratory Data Analysis and Data Mining. http://factominer.free.fr.\n\n\nPearson, Karl. 1896. “Contributions to the Mathematical Theory of Evolution—III, Regression, Heredity and Panmixia.” Philosophical Transactions of the Royal Society of London, A, 187: 253–318.\n\n\n———. 1901. “On Lines and Planes of Closest Fit to Systems of Points in Space.” Philosophical Magazine 6 (2): 559–72."
   },
   {
     "objectID": "04-pca-biplot.html#footnotes",
     "href": "04-pca-biplot.html#footnotes",
     "title": "4  PCA and Biplots",
     "section": "",
-    "text": "This is Euler’s (1758) formula, which states that any convex polyheron must obey the formula \\(V + F - E = 2\\) where \\(V\\) is the number of vertexes (corners), \\(F\\) is the number of faces and \\(E\\) is the number of edges. For example, a tetrahedron or pyramid has \\((V, F, E) = (4, 4, 6)\\) and a cube has \\((V, F, E) = (8, 6, 12)\\). Stated in words, for all solid bodies confined by planes, the sum of the number of vertexes and the number of faces is two less than the number of edges.↩︎"
+    "text": "This is Euler’s (1758) formula, which states that any convex polyheron must obey the formula \\(V + F - E = 2\\) where \\(V\\) is the number of vertexes (corners), \\(F\\) is the number of faces and \\(E\\) is the number of edges. For example, a tetrahedron or pyramid has \\((V, F, E) = (4, 4, 6)\\) and a cube has \\((V, F, E) = (8, 6, 12)\\). Stated in words, for all solid bodies confined by planes, the sum of the number of vertexes and the number of faces is two less than the number of edges.↩︎\nFor example, if two variables in the analysis are height and weight, changing the unit of height from inches to centimeters would multiply its’ variance by \\(2.54^2\\); changing weight from pounds to kilograms would divide its’ variance by \\(2.2^2\\).↩︎"
   },
   {
     "objectID": "linear_models.html#regression",
@@ -291,7 +291,7 @@
     "href": "collinearity-ridge.html#references",
     "title": "7  Collinearity & Ridge Regression",
     "section": "References",
-    "text": "References\n\n\n\n\nBelsley, David A. 1991. Conditioning Diagnostics: Collinearity and Weak Data in Regression. New York, NY: Wiley.\n\n\nBelsley, David A., E. Kuh, and Roy E. Welsch. 1980. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley; Sons.\n\n\nFox, John. 2016. Applied Regression Analysis and Generalized Linear Models. Third edition. Los Angeles: SAGE.\n\n\nFox, John, and Georges Monette. 1992. “Generalized Collinearity Diagnostics.” Journal of the American Statistical Association 87 (417): 178–83.\n\n\nFriendly, Michael, and Ernest Kwan. 2009. “Where’s Waldo: Visualizing Collinearity Diagnostics.” The American Statistician 63 (1): 56–65. https://doi.org/10.1198/tast.2009.0012.\n\n\nGabriel, K. R. 1971. “The Biplot Graphic Display of Matrices with Application to Principal Components Analysis.” Biometrics 58 (3): 453–67.\n\n\nGower, J. C., and D. J. Hand. 1996. Biplots. London: Chapman & Hall.\n\n\nKwan, Ernest, Irene R. R. Lu, and Michael Friendly. 2009. “Tableplot: A New Tool for Assessing Precise Predictions.” Zeitschrift für Psychologie / Journal of Psychology 217 (1): 38–48. https://doi.org/10.1027/0044-3409.217.1.38.\n\n\nLongley, James W. 1967. “An Appraisal of Least Squares Programs for the Electronic Computer from the Point of View of the User.” Journal of the American Statistical Association 62: 819–41. https://doi.org/https://www.tandfonline.com/doi/abs/10.1080/01621459.1967.10500896.\n\n\nMarquardt, Donald W. 1970. “Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Estimation.” Technometrics 12: 591–612.\n\n\nMarquardt, Donald W., and Ronald D. Snee. 1975. “Ridge Regression in Practice.” The American Statistician 29 (1): 3–20. https://doi.org/10.1080/00031305.1975.10479105."
+    "text": "References\n\n\n\n\nBelsley, David A. 1991. Conditioning Diagnostics: Collinearity and Weak Data in Regression. New York, NY: Wiley.\n\n\nBelsley, David A., E. Kuh, and Roy E. Welsch. 1980. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley; Sons.\n\n\nFox, John. 2016. Applied Regression Analysis and Generalized Linear Models. Third edition. Los Angeles: SAGE.\n\n\nFox, John, and Georges Monette. 1992. “Generalized Collinearity Diagnostics.” Journal of the American Statistical Association 87 (417): 178–83.\n\n\nFriendly, Michael, and Ernest Kwan. 2009. “Where’s Waldo: Visualizing Collinearity Diagnostics.” The American Statistician 63 (1): 56–65. https://doi.org/10.1198/tast.2009.0012.\n\n\nGabriel, K. R. 1971. “The Biplot Graphic Display of Matrices with Application to Principal Components Analysis.” Biometrics 58 (3): 453–67. https://doi.org/10.2307/2334381.\n\n\nGower, J. C., and D. J. Hand. 1996. Biplots. London: Chapman & Hall.\n\n\nKwan, Ernest, Irene R. R. Lu, and Michael Friendly. 2009. “Tableplot: A New Tool for Assessing Precise Predictions.” Zeitschrift für Psychologie / Journal of Psychology 217 (1): 38–48. https://doi.org/10.1027/0044-3409.217.1.38.\n\n\nLongley, James W. 1967. “An Appraisal of Least Squares Programs for the Electronic Computer from the Point of View of the User.” Journal of the American Statistical Association 62: 819–41. https://doi.org/https://www.tandfonline.com/doi/abs/10.1080/01621459.1967.10500896.\n\n\nMarquardt, Donald W. 1970. “Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Estimation.” Technometrics 12: 591–612.\n\n\nMarquardt, Donald W., and Ronald D. Snee. 1975. “Ridge Regression in Practice.” The American Statistician 29 (1): 3–20. https://doi.org/10.1080/00031305.1975.10479105."
   },
   {
     "objectID": "collinearity-ridge.html#footnotes",
@@ -480,6 +480,6 @@
     "href": "90-references.html",
     "title": "References",
     "section": "",
-    "text": "Abbott, Edwin A. 1884. Flatland: A Romance of Many Dimensions.\nCutchogue, NY: Buccaneer Books.\n\n\nAdler, Daniel, and Duncan Murdoch. 2023. Rgl: 3D Visualization Using\nOpenGL. https://CRAN.R-project.org/package=rgl.\n\n\nAndrews, D. F. 1972. “Plots of High Dimensional Data.”\nBiometrics 28: 123–36.\n\n\nAnscombe, F. J. 1973. “Graphs in Statistical Analysis.”\nThe American Statistician 27: 17–21.\n\n\nBartlett, M. S. 1937. “Properties of Sufficiency and Statistical\nTests.” Proceedings of the Royal Society of London. Series\nA 160 (901): 268–82. https://doi.org/10.2307/96803.\n\n\nBecker, R. A., W. S. Cleveland, and M.-J. Shyu. 1996. “The Visual\nDesign and Control of Trellis Display.” Journal of\nComputational and Graphical Statistics 5 (2): 123–55.\n\n\nBelsley, David A. 1991. Conditioning Diagnostics: Collinearity and\nWeak Data in Regression. New York, NY: Wiley.\n\n\nBelsley, David A., E. Kuh, and Roy E. Welsch. 1980. Regression\nDiagnostics: Identifying Influential Data and Sources of\nCollinearity. New York: John Wiley; Sons.\n\n\nBiecek, Przemyslaw, Hubert Baniecki, Mateusz Krzyzinski, and Dianne\nCook. 2023. “Performance Is Not Enough: A Story of the Rashomon’s\nQuartet,” February. https://arxiv.org/abs/2302.13356.\n\n\nBox, G. E. P. 1949. “A General Distribution Theory for a Class of\nLikelihood Criteria.” Biometrika 36 (3-4): 317–46. https://doi.org/10.1093/biomet/36.3-4.317.\n\n\n———. 1950. “Problems in the Analysis of Growth and Wear\nCurves.” Biometrics 6: 362–89.\n\n\n———. 1953. “Non-Normality and Tests on Variances.”\nBiometrika 40 (3/4): 318–35. https://doi.org/10.2307/2333350.\n\n\nBrown, Morton B., and Alan B. Forsythe. 1974. “Robust Tests for\nEquality of Variances.” Journal of the American Statistical\nAssociation 69 (346): 364–67. https://doi.org/10.1080/01621459.1974.10482955.\n\n\ncagne, Maurice. 1885. Coordonnées\nParallèles Et Axiales: Méthode de\nTransformation géométrique Et\nProcédé Nouveau de Calcul Graphique\ndéduits de La Considération Des\nCoordonnées Parallèlles. Paris:\nGauthier-Villars. http://historical.library.cornell.edu/cgi-bin/cul.math/docviewer?did=00620001&seq=3.\n\n\nCajori, Florian. 1926. “Origins of Fourth Dimension\nConcepts.” The American Mathematical Monthly 33 (8):\n397–406. https://doi.org/10.1080/00029890.1926.11986607.\n\n\nCattell, Raymond B. 1966. “The Scree Test for the Number of\nFactors.” Multivariate Behavioral Research 1 (2):\n245–76. https://doi.org/10.1207/s15327906mbr0102_10.\n\n\nChambers, John M., and Trevor J. Hastie. 1991. Statistical Models in\ns. Chapman & Hall/CRC.\n\n\nCleveland, W. S. 1979. “Robust Locally Weighted Regression and\nSmoothing Scatterplots.” Journal of the American Statistical\nAssociation 74: 829–36.\n\n\n———. 1985. The Elements of Graphing Data. Monterey, CA:\nWadsworth Advanced Books.\n\n\nCleveland, W. S., and S. J. Devlin. 1988. “Locally Weighted\nRegression: An Approach to Regression Analysis by Local Fitting.”\nJournal of the American Statistical Association 83: 596–610.\n\n\nCleveland, W. S., and R. McGill. 1984. “Graphical Perception:\nTheory, Experimentation and Application to the Development of Graphical\nMethods.” Journal of the American Statistical\nAssociation 79: 531–54.\n\n\n———. 1985. “Graphical Perception and Graphical Methods for\nAnalyzing Scientific Data.” Science 229: 828–33.\n\n\nCochran, W. G. 1941. “The Distribution of the Largest of a Set of\nEstimated Variances as a Fraction of Their Total.” Annals of\nEugenics 11 (1): 47–52. https://doi.org/10.1111/j.1469-1809.1941.tb02271.x.\n\n\nConover, W. J., Mark E. Johnson, and Myrle M. Johnson. 1981. “A\nComparative Study of Tests for Homogeneity of Variances, with\nApplications to the Outer Continental Shelf Bidding Data.”\nTechnometrics 23 (4): 351–61. https://doi.org/10.1080/00401706.1981.10487680.\n\n\nCook, R. D., and S. Weisberg. 1982. Residuals and Influence in\nRegression. New York: Chapman; Hall.\n\n\nCotton, R. 2013. Learning R. Sebastopol, CA:\nO’Reilly Media.\n\n\nCox, D. R. 1968. “Notes on Some Aspects of Regression\nAnalysis.” Journal of the Royal Statistical Society Series\nA 131: 265–79.\n\n\nCurran, James, and Taylor Hersh. 2021. Hotelling: Hotelling’s t^2\nTest and Variants. https://CRAN.R-project.org/package=Hotelling.\n\n\nDavies, Rhian, Steph Locke, and Lucy D’Agostino McGowan. 2022.\ndatasauRus: Datasets from the Datasaurus Dozen. https://CRAN.R-project.org/package=datasauRus.\n\n\nDavis, C. 1990. “Body Image and Weight Preoccupation: A Comparison\nBetween Exercising and Non-Exercising Women.” Appetite\n16 (1): 84. https://doi.org/10.1016/0195-6663(91)90115-9.\n\n\nDempster, A. P. 1969. Elements of Continuous Multivariate\nAnalysis. Reading, MA: Addison-Wesley.\n\n\nDuncan, O. D. 1961. “A Socioeconomic Index for All\nOccupations.” In Occupations and Social Status, edited\nby Jr. A. J. Reiss, P. K. Hatt O. D. Duncan, and C. C. North. New York:\nThe Free Press.\n\n\nEmerson, John W., Walton A. Green, Barret Schloerke, Jason Crowley,\nDianne Cook, Heike Hofmann, and Hadley Wickham. 2013. “The\nGeneralized Pairs Plot.” Journal of Computational and\nGraphical Statistics 22 (1): 79–91. http://www.tandfonline.com/doi/ref/10.1080/10618600.2012.694762.\n\n\nEuler, Leonhard. 1758. “Elementa Doctrinae Solidorum.”\nNovi Commentarii Academiae Scientiarum Petropolitanae 4:\n109–40. https://scholarlycommons.pacific.edu/euler-works/230/.\n\n\nFarquhar, A. B., and H. Farquhar. 1891. Economic and Industrial\nDelusions: A Discourse of the Case for Protection. New York:\nPutnam.\n\n\nFisher, R. A. 1936. “The Use of Multiple Measurements in Taxonomic\nProblems.” Annals of Eugenics 7 (2): 179–88. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x.\n\n\nFox, John. 2016. Applied Regression Analysis and Generalized Linear\nModels. Third edition. Los Angeles: SAGE.\n\n\n———. 2020. Regression Diagnostics. 2nd ed. SAGE\nPublications, Inc. https://doi.org/10.4135/9781071878651.\n\n\nFox, John, and Georges Monette. 1992. “Generalized Collinearity\nDiagnostics.” Journal of the American Statistical\nAssociation 87 (417): 178–83.\n\n\nFox, John, and Sandford Weisberg. 2018. An r Companion to Applied\nRegression. Third. Thousand Oaks CA: SAGE\nPublications. https://books.google.ca/books?id=uPNrDwAAQBAJ.\n\n\nFox, John, Sanford Weisberg, and Brad Price. 2023. Car: Companion to\nApplied Regression. https://CRAN.R-project.org/package=car.\n\n\nFriendly, Michael. 1991. SAS System for Statistical\nGraphics. 1st ed. Cary, NC: SAS Institute. http://www.sas.\ncom/service/doc/pubcat/uspubcat/ind_files/56143.html.\n\n\n———. 1994. “Mosaic Displays for Multi-Way Contingency\nTables.” Journal of the American Statistical Association\n89: 190–200. http://www.jstor.org/stable/2291215.\n\n\n———. 2002. “Corrgrams: Exploratory Displays for Correlation\nMatrices.” The American Statistician 56 (4): 316–24. https://doi.org/10.1198/000313002533.\n\n\n———. 2007. “HE Plots for Multivariate General Linear\nModels.” Journal of Computational and Graphical\nStatistics 16 (2): 421–44. https://doi.org/10.1198/106186007X208407.\n\n\n———. 2008. “The Golden Age of Statistical\nGraphics.” Statistical Science 23 (4): 502–35. https://doi.org/10.1214/08-STS268.\n\n\n———. 2022. “The Life and Works of André-Michel\nGuerry, Revisited.” Sociological Spectrum 42 (4-6):\n233–59. https://doi.org/10.1080/02732173.2022.2078450.\n\n\n———. 2023. vcdExtra: Vcd Extensions and Additions. https://friendly.github.io/vcdExtra/.\n\n\nFriendly, Michael, and E. Kwan. 2003. “Effect Ordering for Data\nDisplays.” Computational Statistics and Data Analysis 43\n(4): 509–39. http://authors.elsevier.com/sd/article/S0167947302002906.\n\n\nFriendly, Michael, and Ernest Kwan. 2009. “Where’s\nWaldo: Visualizing Collinearity Diagnostics.”\nThe American Statistician 63 (1): 56–65. https://doi.org/10.1198/tast.2009.0012.\n\n\nFriendly, Michael, and David Meyer. 2016. Discrete Data Analysis\nwith R: Visualization and Modeling Techniques for\nCategorical and Count Data. Boca Raton, FL: Chapman & Hall/CRC.\n\n\nFriendly, Michael, Georges Monette, and John Fox. 2013.\n“Elliptical Insights: Understanding Statistical Methods Through\nElliptical Geometry.” Statistical Science 28 (1): 1–39.\nhttps://doi.org/10.1214/12-STS402.\n\n\nFriendly, Michael, and Howard Wainer. 2021. A History of Data\nVisualization and Graphic Communication. Cambridge, MA: Harvard\nUniversity Press. https://doi.org/10.4159/9780674259034.\n\n\nFunkhouser, H. Gray. 1937. “Historical Development of the\nGraphical Representation of Statistical Data.” Osiris 3\n(1): 269–405. http://tinyurl.com/32ema9.\n\n\nGabriel, K. R. 1971. “The Biplot Graphic Display of Matrices with\nApplication to Principal Components Analysis.”\nBiometrics 58 (3): 453–67.\n\n\n———. 1981. “Biplot Display of Multivariate Matrices for Inspection\nof Data and Diagnosis.” In Interpreting Multivariate\nData, edited by V. Barnett, 147–73. London: John Wiley; Sons.\n\n\nGalton, Francis. 1886. “Regression Towards Mediocrity in\nHereditary Stature.” Journal of the Anthropological\nInstitute 15: 246–63. http://www.jstor.org/cgi-bin/jstor/viewitem/09595295/dm995266/99p0374f/0.\n\n\nGannett, Henry. 1898. Statistical Atlas of the United States,\nEleventh (1890) Census. Washington, D.C.: U.S. Government Printing\nOffice.\n\n\nGastwirth, Joseph L., Yulia R. Gel, and Weiwen Miao. 2009. “The\nImpact of Levene’s Test of Equality of\nVariances on Statistical Theory and Practice.” Statistical\nScience 24 (3): 343–60. https://doi.org/10.1214/09-STS301.\n\n\nGelman, Andrew, Jessica Hullman, and Lauren Kennedy. 2023. “Causal\nQuartets: Different Ways to Attain the Same Average Treatment\nEffect.” http://www.stat.columbia.edu/~gelman/research/unpublished/causal_quartets.pdf.\n\n\nGorman, Kristen B., Tony D. Williams, and William R. Fraser. 2014.\n“Ecological Sexual Dimorphism and Environmental Variability Within\na Community of Antarctic Penguins (Genus Pygoscelis).” Edited by\nAndré Chiaradia. PLoS ONE 9 (3):\ne90081. https://doi.org/10.1371/journal.pone.0090081.\n\n\nGower, J. C., and D. J. Hand. 1996. Biplots. London: Chapman\n& Hall.\n\n\nGower, J. C., S. G. Lubbe, and N. J. L. Roux. 2011. Understanding\nBiplots. Wiley. http://books.google.ca/books?id=66gQCi5JOKYC.\n\n\nGreenacre, Michael. 1984. Theory and Applications of Correspondence\nAnalysis. London: Academic Press.\n\n\n———. 2010. Biplots in Practice. Fundación BBVA. https://books.google.ca/books?id=dv4LrFP7U\\_EC.\n\n\nGuerry, André-Michel. 1833. Essai Sur La Statistique Morale de La\nFrance. Paris: Crochard.\n\n\nHartigan, J. A. 1975a. Clustering Algorithms. New York: John\nWiley; Sons.\n\n\n———. 1975b. “Printer Graphics for Clustering.” Journal\nof Statistical Computing and Simulation 4: 187–213.\n\n\nHartley, H. O. 1950. “The Use of Range in Analysis of\nVariance.” Biometrika 37 (3–4): 271–80. https://doi.org/10.1093/biomet/37.3-4.271.\n\n\nHartman, L. I. 2016. “Schizophrenia and Schizoaffective Disorder:\nOne Condition or Two?” PhD dissertation, York University.\n\n\nHarwell, M. R., E. N. Rubinstein, W. S. Hayes, and C. C. Olds. 1992.\n“Summarizing Monte Carlo Results in Methodological Research: The\nOne- and Two-Factor Fixed Effects ANOVA Cases.”\nJournal of Educational and Behavioral Statistics 17 (4):\n315–39. https://doi.org/10.3102/10769986017004315.\n\n\nHealy, M. J. R. 1968. “Multivariate Normal Plotting.”\nJournal of the Royal Statistical Society Series C 17 (2):\n157–61.\n\n\nHeinrichs, R. Walter, Farena Pinnock, Eva Muharib, Leah Hartman, Joel\nGoldberg, and Stephanie McDermid Vaz. 2015. “Neurocognitive\nNormality in Schizophrenia Revisited.” Schizophrenia\nResearch: Cognition 2 (4): 227–32. https://doi.org/10.1016/j.scog.2015.09.001.\n\n\nHerschel, John F. W. 1833. “On the Investigation of the Orbits of\nRevolving Double Stars: Being a Supplement to a Paper Entitled\n\"Micrometrical Measures of 364 Double Stars\".” Memoirs of the\nRoyal Astronomical Society 5: 171–222.\n\n\nHoaglin, David C., and Roy E. Welsch. 1978. “The Hat Matrix in\nRegression and ANOVA.” The American\nStatistician 32 (1): 17–22. https://doi.org/10.1080/00031305.1978.10479237.\n\n\nHofmann, Heike, Susan VanderPlas, and Yawei Ge. 2022. Ggpcp:\nParallel Coordinate Plots in the Ggplot2 Framework. https://github.com/heike/ggpcp.\n\n\nHorst, Allison, Alison Hill, and Kristen Gorman. 2022.\nPalmerpenguins: Palmer Archipelago (Antarctica) Penguin Data.\nhttps://allisonhorst.github.io/palmerpenguins/.\n\n\nHotelling, Harold. 1931. “The Generalization of Student’s Ratio.” The Annals of\nMathematical Statistics 2 (3): 360–78. https://doi.org/10.1214/aoms/1177732979.\n\n\nHusson, Francois, Julie Josse, Sebastien Le, and Jeremy Mazet. 2023.\nFactoMineR: Multivariate Exploratory Data Analysis and Data\nMining. http://factominer.free.fr.\n\n\nInselberg, A. 1985. “The Plane with Parallel Coordinates.”\nThe Visual Computer 1: 69–91.\n\n\nKwan, Ernest, Irene R. R. Lu, and Michael Friendly. 2009.\n“Tableplot: A New Tool for Assessing Precise Predictions.”\nZeitschrift für Psychologie / Journal of\nPsychology 217 (1): 38–48. https://doi.org/10.1027/0044-3409.217.1.38.\n\n\nLevene, Howard. 1960. “Robust Tests for Equality of\nVariances.” In Contributions to Probability and Statistics:\nEssays in Honor of Harold Hotelling, edited by Ingram\nOlkin, S. G. Ghurye, W. Hoeffding, W. G. Madow, and H. B. Mann, 278–92.\nStanford, Calif: Stanford University Press.\n\n\nLix, J. M., L. M. Keselman, and H. J. Keselman. 1996.\n“Consequences of Assumption Violations Revisited: A Quantitative\nReview of Alternatives to the One-Way Analysis of Variance\nF Test.” Review of Educational Research 66\n(4): 579–619. https://doi.org/10.3102/00346543066004579.\n\n\nLongley, James W. 1967. “An Appraisal of Least Squares Programs\nfor the Electronic Computer from the Point of View of the User.”\nJournal of the American Statistical Association 62: 819–41.\nhttps://doi.org/https://www.tandfonline.com/doi/abs/10.1080/01621459.1967.10500896.\n\n\nLüdecke, Daniel, Mattan S. Ben-Shachar, Indrajeet Patil, Philip\nWaggoner, and Dominique Makowski. 2021. “performance: An R Package for\nAssessment, Comparison and Testing of Statistical Models.”\nJournal of Open Source Software 6 (60): 3139. https://doi.org/10.21105/joss.03139.\n\n\nLüdecke, Daniel, Mattan S. Ben-Shachar, Indrajeet Patil, Brenton M.\nWiernik, and Dominique Makowski. 2022. Easystats: Framework for Easy\nStatistical Modeling, Visualization, and Reporting. CRAN.\nhttps://easystats.github.io/easystats/.\n\n\nMardia, K. V. 1970. “Measures of Multivariate Skewness and\nKurtosis with Applications.” Biometrika 57 (3): 519–30.\nhttps://doi.org/http://dx.doi.org/10.2307/2334770.\n\n\n———. 1974. “Applications of Some Measures of Multivariate Skewness\nand Kurtosis in Testing Normality and Robustness Studies.”\nSankhya: The Indian Journal of Statistics, Series B 36 (2):\n115–28. http://www.jstor.org/stable/25051892.\n\n\nMarquardt, Donald W. 1970. “Generalized Inverses, Ridge\nRegression, Biased Linear Estimation, and Nonlinear Estimation.”\nTechnometrics 12: 591–612.\n\n\nMarquardt, Donald W., and Ronald D. Snee. 1975. “Ridge Regression\nin Practice.” The American Statistician 29 (1): 3–20. https://doi.org/10.1080/00031305.1975.10479105.\n\n\nMatejka, Justin, and George Fitzmaurice. 2017. “Same Stats,\nDifferent Graphs.” In Proceedings of the 2017\nCHI Conference on Human Factors in Computing Systems.\nACM. https://doi.org/10.1145/3025453.3025912.\n\n\nMatloff, Norman. 2011. The Art of R Programming:\nA Tour of Statistical Software Design. San Francisco,\nCA: No Starch Press.\n\n\nMeyer, David, Achim Zeileis, and Kurt Hornik. 2023. Vcd: Visualizing\nCategorical Data. https://CRAN.R-project.org/package=vcd.\n\n\nMonette, Georges. 1990. “Geometry of Multiple Regression and\nInteractive 3-D Graphics.” In Modern Methods of\nData Analysis, edited by J. Fox and S. Long, 209–56. Beverly Hills,\nCA: SAGE Publications.\n\n\nO’Brien, Peter C. 1992. “Robust Procedures for Testing Equality of\nCovariance Matrices.” Biometrics 48 (3): 819–27. http://www.jstor.org/stable/2532347.\n\n\nOtto, James, and David Kahle. 2023. Ggdensity: Interpretable\nBivariate Density Visualization with Ggplot2. https://jamesotto852.github.io/ggdensity/.\n\n\nPearson, Karl. 1896. “Contributions to the Mathematical Theory of\nEvolution—III, Regression, Heredity and Panmixia.”\nPhilosophical Transactions of the Royal Society of London, A,\n187: 253–318.\n\n\n———. 1901. “On Lines and Planes of Closest Fit to Systems of\nPoints in Space.” Philosophical Magazine 6 (2): 559–72.\n\n\nPineo, Peter O., and John Porter. 2008. “Occupational Prestige in\nCanada.” Canadian Review of Sociology 4 (1): 24–40. https://doi.org/10.1111/j.1755-618x.1967.tb00472.x.\n\n\nPlayfair, William. 1786. Commercial and Political Atlas:\nRepresenting, by Copper-Plate Charts, the Progress of the Commerce,\nRevenues, Expenditure, and Debts of England, During the Whole of the\nEighteenth Century. London: Debrett; Robinson;; Sewell. http://ucpj.uchicago.edu/Isis/journal/demo/v000n000/000000/000000.fg4.html.\n\n\n———. 1801. Statistical Breviary; Shewing, on a Principle Entirely\nNew, the Resources of Every State and Kingdom in\nEurope. London: Wallis.\n\n\nRogan, J. C., and H. J. Keselman. 1977. “Is the ANOVA\nf-Test Robust to Variance Heterogeneity When Sample Sizes Are Equal?: An\nInvestigation via a Coefficient of Variation.” American\nEducational Research Journal 14 (4): 493–98. https://doi.org/10.3102/00028312014004493.\n\n\nSarkar, Deepayan. 2023. Lattice: Trellis Graphics for r. https://lattice.r-forge.r-project.org/.\n\n\nSchloerke, Barret, Di Cook, Joseph Larmarange, Francois Briatte, Moritz\nMarbach, Edwin Thoen, Amos Elberg, and Jason Crowley. 2023. GGally:\nExtension to Ggplot2. https://ggobi.github.io/ggally/.\n\n\nScott, David W. 1992. Multivariate Density Estimation: Theory,\nPractice, and Visualization. A Wiley-Interscience Publication. NY:\nWiley.\n\n\nSilverman, B. W. 1986. Density Estimation for Statistics and Data\nAnalysis. New York: Chapman & Hall.\n\n\nTeetor, Paul. 2011. R cookbook.\nSebastopol, CA: O’Reilly Media.\n\n\nTiku, M. L., and N. Balakrishnan. 1984. “Testing Equality of\nPopulation Variances the Robust Way.” Communications in\nStatistics - Theory and Methods 13 (17): 2143–59. https://doi.org/10.1080/03610928408828818.\n\n\nTimm, N. H. 1975. Multivariate Analysis with Applications in\nEducation and Psychology. Belmont, CA: Wadsworth (Brooks/Cole).\n\n\nVanderPlas, Susan, Yawei Ge, Antony Unwin, and Heike Hofmann. 2023.\n“Penguins Go Parallel: A Grammar of Graphics Framework for\nGeneralized Parallel Coordinate Plots.” Journal of\nComputational and Graphical Statistics, April, 1–16. https://doi.org/10.1080/10618600.2023.2195462.\n\n\nWegman, Edward J. 1990. “Hyperdimensional Data Analysis Using\nParallel Coordinates.” Journal of the American Statistical\nAssociation 85 (411): 664–75.\n\n\nWei, Taiyun, and Viliam Simko. 2021. Corrplot: Visualization of a\nCorrelation Matrix. https://github.com/taiyun/corrplot.\n\n\nWickham, Hadley. 2014. Advanced R. Boca\nRaton, FL: Chapman and Hall/CRC.\n\n\nWood, Simon N. 2006. Generalized Additive Models: An Introduction\nwith r. Chapman; Hall/CRC Press.\n\n\nWright, Kevin. 2021. Corrgram: Plot a Correlogram. https://kwstat.github.io/corrgram/.\n\n\nXie, Yihui. 2021. Animation: A Gallery of Animations in Statistics\nand Utilities to Create Animations. https://yihui.org/animation/.\n\n\nZhang, Ji, and Dennis D. Boos. 1992. “Bootstrap Critical Values\nfor Testing Homogeneity of Covariance Matrices.” Journal of\nthe American Statistical Association 87 (418): 425–29. http://www.jstor.org/stable/2290273.\n\n\nPackage used"
+    "text": "Abbott, Edwin A. 1884. Flatland: A Romance of Many Dimensions.\nCutchogue, NY: Buccaneer Books.\n\n\nAdler, Daniel, and Duncan Murdoch. 2023. Rgl: 3D Visualization Using\nOpenGL. https://CRAN.R-project.org/package=rgl.\n\n\nAndrews, D. F. 1972. “Plots of High Dimensional Data.”\nBiometrics 28: 123–36.\n\n\nAnscombe, F. J. 1973. “Graphs in Statistical Analysis.”\nThe American Statistician 27: 17–21.\n\n\nBartlett, M. S. 1937. “Properties of Sufficiency and Statistical\nTests.” Proceedings of the Royal Society of London. Series\nA 160 (901): 268–82. https://doi.org/10.2307/96803.\n\n\nBecker, R. A., W. S. Cleveland, and M.-J. Shyu. 1996. “The Visual\nDesign and Control of Trellis Display.” Journal of\nComputational and Graphical Statistics 5 (2): 123–55.\n\n\nBelsley, David A. 1991. Conditioning Diagnostics: Collinearity and\nWeak Data in Regression. New York, NY: Wiley.\n\n\nBelsley, David A., E. Kuh, and Roy E. Welsch. 1980. Regression\nDiagnostics: Identifying Influential Data and Sources of\nCollinearity. New York: John Wiley; Sons.\n\n\nBiecek, Przemyslaw, Hubert Baniecki, Mateusz Krzyzinski, and Dianne\nCook. 2023. “Performance Is Not Enough: A Story of the Rashomon’s\nQuartet,” February. https://arxiv.org/abs/2302.13356.\n\n\nBox, G. E. P. 1949. “A General Distribution Theory for a Class of\nLikelihood Criteria.” Biometrika 36 (3-4): 317–46. https://doi.org/10.1093/biomet/36.3-4.317.\n\n\n———. 1950. “Problems in the Analysis of Growth and Wear\nCurves.” Biometrics 6: 362–89.\n\n\n———. 1953. “Non-Normality and Tests on Variances.”\nBiometrika 40 (3/4): 318–35. https://doi.org/10.2307/2333350.\n\n\nBrown, Morton B., and Alan B. Forsythe. 1974. “Robust Tests for\nEquality of Variances.” Journal of the American Statistical\nAssociation 69 (346): 364–67. https://doi.org/10.1080/01621459.1974.10482955.\n\n\ncagne, Maurice. 1885. Coordonnées\nParallèles Et Axiales: Méthode de\nTransformation géométrique Et\nProcédé Nouveau de Calcul Graphique\ndéduits de La Considération Des\nCoordonnées Parallèlles. Paris:\nGauthier-Villars. http://historical.library.cornell.edu/cgi-bin/cul.math/docviewer?did=00620001&seq=3.\n\n\nCajori, Florian. 1926. “Origins of Fourth Dimension\nConcepts.” The American Mathematical Monthly 33 (8):\n397–406. https://doi.org/10.1080/00029890.1926.11986607.\n\n\nCattell, Raymond B. 1966. “The Scree Test for the Number of\nFactors.” Multivariate Behavioral Research 1 (2):\n245–76. https://doi.org/10.1207/s15327906mbr0102_10.\n\n\nChambers, John M., and Trevor J. Hastie. 1991. Statistical Models in\ns. Chapman & Hall/CRC.\n\n\nCleveland, W. S. 1979. “Robust Locally Weighted Regression and\nSmoothing Scatterplots.” Journal of the American Statistical\nAssociation 74: 829–36.\n\n\n———. 1985. The Elements of Graphing Data. Monterey, CA:\nWadsworth Advanced Books.\n\n\nCleveland, W. S., and S. J. Devlin. 1988. “Locally Weighted\nRegression: An Approach to Regression Analysis by Local Fitting.”\nJournal of the American Statistical Association 83: 596–610.\n\n\nCleveland, W. S., and R. McGill. 1984. “Graphical Perception:\nTheory, Experimentation and Application to the Development of Graphical\nMethods.” Journal of the American Statistical\nAssociation 79: 531–54.\n\n\n———. 1985. “Graphical Perception and Graphical Methods for\nAnalyzing Scientific Data.” Science 229: 828–33.\n\n\nCochran, W. G. 1941. “The Distribution of the Largest of a Set of\nEstimated Variances as a Fraction of Their Total.” Annals of\nEugenics 11 (1): 47–52. https://doi.org/10.1111/j.1469-1809.1941.tb02271.x.\n\n\nConover, W. J., Mark E. Johnson, and Myrle M. Johnson. 1981. “A\nComparative Study of Tests for Homogeneity of Variances, with\nApplications to the Outer Continental Shelf Bidding Data.”\nTechnometrics 23 (4): 351–61. https://doi.org/10.1080/00401706.1981.10487680.\n\n\nCook, R. D., and S. Weisberg. 1982. Residuals and Influence in\nRegression. New York: Chapman; Hall.\n\n\nCotton, R. 2013. Learning R. Sebastopol, CA:\nO’Reilly Media.\n\n\nCox, D. R. 1968. “Notes on Some Aspects of Regression\nAnalysis.” Journal of the Royal Statistical Society Series\nA 131: 265–79.\n\n\nCurran, James, and Taylor Hersh. 2021. Hotelling: Hotelling’s t^2\nTest and Variants. https://CRAN.R-project.org/package=Hotelling.\n\n\nDavies, Rhian, Steph Locke, and Lucy D’Agostino McGowan. 2022.\ndatasauRus: Datasets from the Datasaurus Dozen. https://CRAN.R-project.org/package=datasauRus.\n\n\nDavis, C. 1990. “Body Image and Weight Preoccupation: A Comparison\nBetween Exercising and Non-Exercising Women.” Appetite\n16 (1): 84. https://doi.org/10.1016/0195-6663(91)90115-9.\n\n\nDempster, A. P. 1969. Elements of Continuous Multivariate\nAnalysis. Reading, MA: Addison-Wesley.\n\n\nDuncan, O. D. 1961. “A Socioeconomic Index for All\nOccupations.” In Occupations and Social Status, edited\nby Jr. A. J. Reiss, P. K. Hatt O. D. Duncan, and C. C. North. New York:\nThe Free Press.\n\n\nEmerson, John W., Walton A. Green, Barret Schloerke, Jason Crowley,\nDianne Cook, Heike Hofmann, and Hadley Wickham. 2013. “The\nGeneralized Pairs Plot.” Journal of Computational and\nGraphical Statistics 22 (1): 79–91. http://www.tandfonline.com/doi/ref/10.1080/10618600.2012.694762.\n\n\nEuler, Leonhard. 1758. “Elementa Doctrinae Solidorum.”\nNovi Commentarii Academiae Scientiarum Petropolitanae 4:\n109–40. https://scholarlycommons.pacific.edu/euler-works/230/.\n\n\nFarquhar, A. B., and H. Farquhar. 1891. Economic and Industrial\nDelusions: A Discourse of the Case for Protection. New York:\nPutnam.\n\n\nFisher, R. A. 1936. “The Use of Multiple Measurements in Taxonomic\nProblems.” Annals of Eugenics 7 (2): 179–88. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x.\n\n\nFox, John. 2016. Applied Regression Analysis and Generalized Linear\nModels. Third edition. Los Angeles: SAGE.\n\n\n———. 2020. Regression Diagnostics. 2nd ed. SAGE\nPublications, Inc. https://doi.org/10.4135/9781071878651.\n\n\nFox, John, and Georges Monette. 1992. “Generalized Collinearity\nDiagnostics.” Journal of the American Statistical\nAssociation 87 (417): 178–83.\n\n\nFox, John, and Sandford Weisberg. 2018. An r Companion to Applied\nRegression. Third. Thousand Oaks CA: SAGE\nPublications. https://books.google.ca/books?id=uPNrDwAAQBAJ.\n\n\nFox, John, Sanford Weisberg, and Brad Price. 2023. Car: Companion to\nApplied Regression. https://CRAN.R-project.org/package=car.\n\n\nFriendly, Michael. 1991. SAS System for Statistical\nGraphics. 1st ed. Cary, NC: SAS Institute. http://www.sas.\ncom/service/doc/pubcat/uspubcat/ind_files/56143.html.\n\n\n———. 1994. “Mosaic Displays for Multi-Way Contingency\nTables.” Journal of the American Statistical Association\n89: 190–200. http://www.jstor.org/stable/2291215.\n\n\n———. 2002. “Corrgrams: Exploratory Displays for Correlation\nMatrices.” The American Statistician 56 (4): 316–24. https://doi.org/10.1198/000313002533.\n\n\n———. 2007. “HE Plots for Multivariate General Linear\nModels.” Journal of Computational and Graphical\nStatistics 16 (2): 421–44. https://doi.org/10.1198/106186007X208407.\n\n\n———. 2008. “The Golden Age of Statistical\nGraphics.” Statistical Science 23 (4): 502–35. https://doi.org/10.1214/08-STS268.\n\n\n———. 2022. “The Life and Works of André-Michel\nGuerry, Revisited.” Sociological Spectrum 42 (4-6):\n233–59. https://doi.org/10.1080/02732173.2022.2078450.\n\n\n———. 2023. vcdExtra: Vcd Extensions and Additions. https://friendly.github.io/vcdExtra/.\n\n\nFriendly, Michael, and E. Kwan. 2003. “Effect Ordering for Data\nDisplays.” Computational Statistics and Data Analysis 43\n(4): 509–39. http://authors.elsevier.com/sd/article/S0167947302002906.\n\n\nFriendly, Michael, and Ernest Kwan. 2009. “Where’s\nWaldo: Visualizing Collinearity Diagnostics.”\nThe American Statistician 63 (1): 56–65. https://doi.org/10.1198/tast.2009.0012.\n\n\nFriendly, Michael, and David Meyer. 2016. Discrete Data Analysis\nwith R: Visualization and Modeling Techniques for\nCategorical and Count Data. Boca Raton, FL: Chapman & Hall/CRC.\n\n\nFriendly, Michael, Georges Monette, and John Fox. 2013.\n“Elliptical Insights: Understanding Statistical Methods Through\nElliptical Geometry.” Statistical Science 28 (1): 1–39.\nhttps://doi.org/10.1214/12-STS402.\n\n\nFriendly, Michael, and Howard Wainer. 2021. A History of Data\nVisualization and Graphic Communication. Cambridge, MA: Harvard\nUniversity Press. https://doi.org/10.4159/9780674259034.\n\n\nFunkhouser, H. Gray. 1937. “Historical Development of the\nGraphical Representation of Statistical Data.” Osiris 3\n(1): 269–405. http://tinyurl.com/32ema9.\n\n\nGabriel, K. R. 1971. “The Biplot Graphic Display of Matrices with\nApplication to Principal Components Analysis.”\nBiometrics 58 (3): 453–67. https://doi.org/10.2307/2334381.\n\n\n———. 1981. “Biplot Display of Multivariate Matrices for Inspection\nof Data and Diagnosis.” In Interpreting Multivariate\nData, edited by V. Barnett, 147–73. London: John Wiley; Sons.\n\n\nGalton, Francis. 1886. “Regression Towards Mediocrity in\nHereditary Stature.” Journal of the Anthropological\nInstitute 15: 246–63. http://www.jstor.org/cgi-bin/jstor/viewitem/09595295/dm995266/99p0374f/0.\n\n\nGannett, Henry. 1898. Statistical Atlas of the United States,\nEleventh (1890) Census. Washington, D.C.: U.S. Government Printing\nOffice.\n\n\nGastwirth, Joseph L., Yulia R. Gel, and Weiwen Miao. 2009. “The\nImpact of Levene’s Test of Equality of\nVariances on Statistical Theory and Practice.” Statistical\nScience 24 (3): 343–60. https://doi.org/10.1214/09-STS301.\n\n\nGelman, Andrew, Jessica Hullman, and Lauren Kennedy. 2023. “Causal\nQuartets: Different Ways to Attain the Same Average Treatment\nEffect.” http://www.stat.columbia.edu/~gelman/research/unpublished/causal_quartets.pdf.\n\n\nGorman, Kristen B., Tony D. Williams, and William R. Fraser. 2014.\n“Ecological Sexual Dimorphism and Environmental Variability Within\na Community of Antarctic Penguins (Genus Pygoscelis).” Edited by\nAndré Chiaradia. PLoS ONE 9 (3):\ne90081. https://doi.org/10.1371/journal.pone.0090081.\n\n\nGower, J. C., and D. J. Hand. 1996. Biplots. London: Chapman\n& Hall.\n\n\nGower, J. C., S. G. Lubbe, and N. J. L. Roux. 2011. Understanding\nBiplots. Wiley. http://books.google.ca/books?id=66gQCi5JOKYC.\n\n\nGreenacre, Michael. 1984. Theory and Applications of Correspondence\nAnalysis. London: Academic Press.\n\n\n———. 2010. Biplots in Practice. Fundación BBVA. https://books.google.ca/books?id=dv4LrFP7U\\_EC.\n\n\nGuerry, André-Michel. 1833. Essai Sur La Statistique Morale de La\nFrance. Paris: Crochard.\n\n\nHartigan, J. A. 1975a. Clustering Algorithms. New York: John\nWiley; Sons.\n\n\n———. 1975b. “Printer Graphics for Clustering.” Journal\nof Statistical Computing and Simulation 4: 187–213.\n\n\nHartley, H. O. 1950. “The Use of Range in Analysis of\nVariance.” Biometrika 37 (3–4): 271–80. https://doi.org/10.1093/biomet/37.3-4.271.\n\n\nHartman, L. I. 2016. “Schizophrenia and Schizoaffective Disorder:\nOne Condition or Two?” PhD dissertation, York University.\n\n\nHarwell, M. R., E. N. Rubinstein, W. S. Hayes, and C. C. Olds. 1992.\n“Summarizing Monte Carlo Results in Methodological Research: The\nOne- and Two-Factor Fixed Effects ANOVA Cases.”\nJournal of Educational and Behavioral Statistics 17 (4):\n315–39. https://doi.org/10.3102/10769986017004315.\n\n\nHealy, M. J. R. 1968. “Multivariate Normal Plotting.”\nJournal of the Royal Statistical Society Series C 17 (2):\n157–61.\n\n\nHeinrichs, R. Walter, Farena Pinnock, Eva Muharib, Leah Hartman, Joel\nGoldberg, and Stephanie McDermid Vaz. 2015. “Neurocognitive\nNormality in Schizophrenia Revisited.” Schizophrenia\nResearch: Cognition 2 (4): 227–32. https://doi.org/10.1016/j.scog.2015.09.001.\n\n\nHerschel, John F. W. 1833. “On the Investigation of the Orbits of\nRevolving Double Stars: Being a Supplement to a Paper Entitled\n\"Micrometrical Measures of 364 Double Stars\".” Memoirs of the\nRoyal Astronomical Society 5: 171–222.\n\n\nHoaglin, David C., and Roy E. Welsch. 1978. “The Hat Matrix in\nRegression and ANOVA.” The American\nStatistician 32 (1): 17–22. https://doi.org/10.1080/00031305.1978.10479237.\n\n\nHofmann, Heike, Susan VanderPlas, and Yawei Ge. 2022. Ggpcp:\nParallel Coordinate Plots in the Ggplot2 Framework. https://github.com/heike/ggpcp.\n\n\nHorst, Allison, Alison Hill, and Kristen Gorman. 2022.\nPalmerpenguins: Palmer Archipelago (Antarctica) Penguin Data.\nhttps://allisonhorst.github.io/palmerpenguins/.\n\n\nHotelling, Harold. 1931. “The Generalization of Student’s Ratio.” The Annals of\nMathematical Statistics 2 (3): 360–78. https://doi.org/10.1214/aoms/1177732979.\n\n\nHusson, Francois, Julie Josse, Sebastien Le, and Jeremy Mazet. 2023.\nFactoMineR: Multivariate Exploratory Data Analysis and Data\nMining. http://factominer.free.fr.\n\n\nInselberg, A. 1985. “The Plane with Parallel Coordinates.”\nThe Visual Computer 1: 69–91.\n\n\nKwan, Ernest, Irene R. R. Lu, and Michael Friendly. 2009.\n“Tableplot: A New Tool for Assessing Precise Predictions.”\nZeitschrift für Psychologie / Journal of\nPsychology 217 (1): 38–48. https://doi.org/10.1027/0044-3409.217.1.38.\n\n\nLevene, Howard. 1960. “Robust Tests for Equality of\nVariances.” In Contributions to Probability and Statistics:\nEssays in Honor of Harold Hotelling, edited by Ingram\nOlkin, S. G. Ghurye, W. Hoeffding, W. G. Madow, and H. B. Mann, 278–92.\nStanford, Calif: Stanford University Press.\n\n\nLix, J. M., L. M. Keselman, and H. J. Keselman. 1996.\n“Consequences of Assumption Violations Revisited: A Quantitative\nReview of Alternatives to the One-Way Analysis of Variance\nF Test.” Review of Educational Research 66\n(4): 579–619. https://doi.org/10.3102/00346543066004579.\n\n\nLongley, James W. 1967. “An Appraisal of Least Squares Programs\nfor the Electronic Computer from the Point of View of the User.”\nJournal of the American Statistical Association 62: 819–41.\nhttps://doi.org/https://www.tandfonline.com/doi/abs/10.1080/01621459.1967.10500896.\n\n\nLüdecke, Daniel, Mattan S. Ben-Shachar, Indrajeet Patil, Philip\nWaggoner, and Dominique Makowski. 2021. “performance: An R Package for\nAssessment, Comparison and Testing of Statistical Models.”\nJournal of Open Source Software 6 (60): 3139. https://doi.org/10.21105/joss.03139.\n\n\nLüdecke, Daniel, Mattan S. Ben-Shachar, Indrajeet Patil, Brenton M.\nWiernik, and Dominique Makowski. 2022. Easystats: Framework for Easy\nStatistical Modeling, Visualization, and Reporting. CRAN.\nhttps://easystats.github.io/easystats/.\n\n\nMardia, K. V. 1970. “Measures of Multivariate Skewness and\nKurtosis with Applications.” Biometrika 57 (3): 519–30.\nhttps://doi.org/http://dx.doi.org/10.2307/2334770.\n\n\n———. 1974. “Applications of Some Measures of Multivariate Skewness\nand Kurtosis in Testing Normality and Robustness Studies.”\nSankhya: The Indian Journal of Statistics, Series B 36 (2):\n115–28. http://www.jstor.org/stable/25051892.\n\n\nMarquardt, Donald W. 1970. “Generalized Inverses, Ridge\nRegression, Biased Linear Estimation, and Nonlinear Estimation.”\nTechnometrics 12: 591–612.\n\n\nMarquardt, Donald W., and Ronald D. Snee. 1975. “Ridge Regression\nin Practice.” The American Statistician 29 (1): 3–20. https://doi.org/10.1080/00031305.1975.10479105.\n\n\nMatejka, Justin, and George Fitzmaurice. 2017. “Same Stats,\nDifferent Graphs.” In Proceedings of the 2017\nCHI Conference on Human Factors in Computing Systems.\nACM. https://doi.org/10.1145/3025453.3025912.\n\n\nMatloff, Norman. 2011. The Art of R Programming:\nA Tour of Statistical Software Design. San Francisco,\nCA: No Starch Press.\n\n\nMcGowan, Lucy D’Agostino, Travis Gerke, and Malcolm Barrett. 2023.\n“Causal Inference Is Not Just a Statistics Problem.”\nJournal of Statistics and Data Science Education, December,\n1–9. https://doi.org/10.1080/26939169.2023.2276446.\n\n\nMeyer, David, Achim Zeileis, and Kurt Hornik. 2023. Vcd: Visualizing\nCategorical Data. https://CRAN.R-project.org/package=vcd.\n\n\nMonette, Georges. 1990. “Geometry of Multiple Regression and\nInteractive 3-D Graphics.” In Modern Methods of\nData Analysis, edited by J. Fox and S. Long, 209–56. Beverly Hills,\nCA: SAGE Publications.\n\n\nO’Brien, Peter C. 1992. “Robust Procedures for Testing Equality of\nCovariance Matrices.” Biometrics 48 (3): 819–27. http://www.jstor.org/stable/2532347.\n\n\nOtto, James, and David Kahle. 2023. Ggdensity: Interpretable\nBivariate Density Visualization with Ggplot2. https://jamesotto852.github.io/ggdensity/.\n\n\nPearson, Karl. 1896. “Contributions to the Mathematical Theory of\nEvolution—III, Regression, Heredity and Panmixia.”\nPhilosophical Transactions of the Royal Society of London, A,\n187: 253–318.\n\n\n———. 1901. “On Lines and Planes of Closest Fit to Systems of\nPoints in Space.” Philosophical Magazine 6 (2): 559–72.\n\n\nPineo, Peter O., and John Porter. 2008. “Occupational Prestige in\nCanada.” Canadian Review of Sociology 4 (1): 24–40. https://doi.org/10.1111/j.1755-618x.1967.tb00472.x.\n\n\nPlayfair, William. 1786. Commercial and Political Atlas:\nRepresenting, by Copper-Plate Charts, the Progress of the Commerce,\nRevenues, Expenditure, and Debts of England, During the Whole of the\nEighteenth Century. London: Debrett; Robinson;; Sewell. http://ucpj.uchicago.edu/Isis/journal/demo/v000n000/000000/000000.fg4.html.\n\n\n———. 1801. Statistical Breviary; Shewing, on a Principle Entirely\nNew, the Resources of Every State and Kingdom in\nEurope. London: Wallis.\n\n\nRogan, J. C., and H. J. Keselman. 1977. “Is the ANOVA\nf-Test Robust to Variance Heterogeneity When Sample Sizes Are Equal?: An\nInvestigation via a Coefficient of Variation.” American\nEducational Research Journal 14 (4): 493–98. https://doi.org/10.3102/00028312014004493.\n\n\nSarkar, Deepayan. 2023. Lattice: Trellis Graphics for r. https://lattice.r-forge.r-project.org/.\n\n\nSchloerke, Barret, Di Cook, Joseph Larmarange, Francois Briatte, Moritz\nMarbach, Edwin Thoen, Amos Elberg, and Jason Crowley. 2023. GGally:\nExtension to Ggplot2. https://ggobi.github.io/ggally/.\n\n\nScott, David W. 1992. Multivariate Density Estimation: Theory,\nPractice, and Visualization. A Wiley-Interscience Publication. NY:\nWiley.\n\n\nSilverman, B. W. 1986. Density Estimation for Statistics and Data\nAnalysis. New York: Chapman & Hall.\n\n\nTeetor, Paul. 2011. R cookbook.\nSebastopol, CA: O’Reilly Media.\n\n\nTiku, M. L., and N. Balakrishnan. 1984. “Testing Equality of\nPopulation Variances the Robust Way.” Communications in\nStatistics - Theory and Methods 13 (17): 2143–59. https://doi.org/10.1080/03610928408828818.\n\n\nTimm, N. H. 1975. Multivariate Analysis with Applications in\nEducation and Psychology. Belmont, CA: Wadsworth (Brooks/Cole).\n\n\nVanderPlas, Susan, Yawei Ge, Antony Unwin, and Heike Hofmann. 2023.\n“Penguins Go Parallel: A Grammar of Graphics Framework for\nGeneralized Parallel Coordinate Plots.” Journal of\nComputational and Graphical Statistics, April, 1–16. https://doi.org/10.1080/10618600.2023.2195462.\n\n\nWegman, Edward J. 1990. “Hyperdimensional Data Analysis Using\nParallel Coordinates.” Journal of the American Statistical\nAssociation 85 (411): 664–75.\n\n\nWei, Taiyun, and Viliam Simko. 2021. Corrplot: Visualization of a\nCorrelation Matrix. https://github.com/taiyun/corrplot.\n\n\nWickham, Hadley. 2014. Advanced R. Boca\nRaton, FL: Chapman and Hall/CRC.\n\n\nWood, Simon N. 2006. Generalized Additive Models: An Introduction\nwith r. Chapman; Hall/CRC Press.\n\n\nWright, Kevin. 2021. Corrgram: Plot a Correlogram. https://kwstat.github.io/corrgram/.\n\n\nXie, Yihui. 2021. Animation: A Gallery of Animations in Statistics\nand Utilities to Create Animations. https://yihui.org/animation/.\n\n\nZhang, Ji, and Dennis D. Boos. 1992. “Bootstrap Critical Values\nfor Testing Homogeneity of Covariance Matrices.” Journal of\nthe American Statistical Association 87 (418): 425–29. http://www.jstor.org/stable/2290273.\n\n\nPackage used"
   }
 ]
\ No newline at end of file
diff --git a/figs/fig-crime-biplot3-1.png b/figs/fig-crime-biplot3-1.png
new file mode 100644
index 00000000..7fec9346
Binary files /dev/null and b/figs/fig-crime-biplot3-1.png differ
diff --git a/figs/fig-crime-ggscreeplot-1.png b/figs/fig-crime-ggscreeplot-1.png
index 8a00e407..0de15a3b 100644
Binary files a/figs/fig-crime-ggscreeplot-1.png and b/figs/fig-crime-ggscreeplot-1.png differ