Code
TODO: Simple PCA example using workers data
4.2.2 Mathematics and geometry of PCA
-As the ideas of principal components developed, there was a happy marriage of Galton’s geometrical intuition and Pearson’s mathematical analysis. The best men at the wedding were ellipses and higher-dimensional ellipsoids. The brides maids were eigenvectors, pointing in as many different directions as space would allow, each sized according to their associated eigenvalues. Attending the wedding were the ghosts of uncles, Leonhard Euler, Jean-Louis Lagrange, Augustin-Louis Cauchy and others who had earlier discovered the mathematical properties of quadratic forms in relation to problems in physics.
-The key idea in the statistical application was that, for a set of variables \(\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_p\), the \(p \times p\) covariance matrix \(\mathbf{S}\) could be expressed exactly as a matrix product involving a matrix \(\mathbf{V}\), whose columns are eigenvectors (\(\mathbf{v}_i\)) and a diagonal matrix \(\mathbf{\Lambda}\), whose diagonal elements (\(\lambda_i\)) are the corresponding eigenvalues,
+As the ideas of principal components developed, there was a happy marriage of Galton’s geometrical intuition and Pearson’s mathematical analysis. The best men at the wedding were ellipses and higher-dimensional ellipsoids. The brides maids were eigenvectors, pointing in as many different directions as space would allow, each sized according to their associated eigenvalues. Attending the wedding were the ghosts of uncles, Leonhard Euler, Jean-Louis Lagrange, Augustin-Louis Cauchy and others who had earlier discovered the mathematical properties of ellipses and quadratic forms in relation to problems in physics.
+The key idea in the statistical application was that, for a set of variables \(\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_p\), the \(p \times p\) covariance matrix \(\mathbf{S}\) could be expressed exactly as a matrix product involving a matrix \(\mathbf{V}\), whose columns are eigenvectors (\(\mathbf{v}_i\)) and a diagonal matrix \(\mathbf{\Lambda}\), whose diagonal elements (\(\lambda_i\)) are the corresponding eigenvalues.
+To explain this, it is helpful to use a bit of matrix math:
+\[\begin{align*}
+\mathbf{S}_{p \times p} & = \mathbf{V}_{p \times p} \quad\quad \mathbf{\Lambda}_{p \times p} \quad\quad \mathbf{V}_{p \times p}^T \\
+ & = \left( \mathbf{v}_1, \, \mathbf{v}_2, \,\dots, \, \mathbf{v}_p \right)
+ \begin{pmatrix}
+ \lambda_1 & & & \\
+ & \lambda_2 & & \\
+ & & \ddots & \\
+ & & & \lambda_p
+ \end{pmatrix}
+
+ \begin{pmatrix}
+ \mathbf{v}_1^T\\
+ \mathbf{v}_2^T\\
+ \vdots\\
+ \mathbf{v}_p^T\\
+ \end{pmatrix}
+ \\
+ & = \lambda_1 \mathbf{v}_1 \mathbf{v}_1^T + \lambda_2 \mathbf{v}_2 \mathbf{v}_2^T + \cdots + \lambda_p \mathbf{v}_p \mathbf{v}_p^T
+\end{align*}\]
+In this equation,
+
+The last line follows because \(\mathbf{\Lambda}\) is a diagonal matrix, so \(\mathbf{S}\) is expressed as a sum of outer products of each \(\mathbf{v}_i\) with itself.
+The columns of \(\mathbf{V}\) are the eigenvectors of \(\mathbf{S}\). They are orthogonal and of unit length, so \(\mathbf{V}^T \mathbf{V} = \mathbf{I}\) and thus they represent orthogonal (uncorrelated) directions in data space.
+The columns \(\mathbf{v}_i\) are the weights applied to the variables to produce the scores on the principal components. For example, the first principal component is the weighted sum:
+
\[
-\begin{align*}
-\mathbf{S} & = \mathbf{V} \mathbf{\Lambda} \mathbf{V}^T \\
- & = \lambda_1 \mathbf{v}_1 \mathbf{v}_1^T + \lambda_2 \mathbf{v}_2 \mathbf{v}_2^T + ... + \lambda_p \mathbf{v}_p \mathbf{v}_p^T
-\end{align*}
-\] In this equation, * The columns of \(\mathbf{V}\) are the eigenvectors and they represent orthogonal (uncorrelated) directions in data space. The values \(\mathbf{v}_i\) are the weights applied to the variables. * The eigenvalues, \(\lambda_i\) …
-For the case of two variables, \(\mathbf{x}_1\) and \(\mathbf{x}_2\) Figure 4.6 shows the transformation …
+PC1 = v_{11} \mathbf{x}_1 + v_{12} \mathbf{x}_2 + \cdots + v_{1p} \mathbf{x}_p
+\]
+
+The eigenvalues, \(\lambda_1, \lambda_2, \dots, \lambda_p\) are the variances of the the components, because \(\mathbf{v}_i^T \;\mathbf{S} \; \mathbf{v}_i = \lambda_i\).
+It is usually the case that the variables \(\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_p\) are linearly independent, which means that none of these is an exact linear combination of the others. In this case, all eigenvalues \(\lambda_i\) are positive and the covariance matrix \(\mathbf{S}\) is said to have rank \(p\).
+Here is the key fact: If the eigenvalues are arranged in order, so that \(\lambda_1 > \lambda_2 > \dots > \lambda_p\), then the first \(d\) components give a \(d\)-dimensional approximation to \(\mathbf{S}\), which accounts for \(\sigma_i^d \lambda_i / \sigma_i^p \lambda_i\) of the total variance.
+
+
4.2.2 Mathematics and geometry of PCA
-As the ideas of principal components developed, there was a happy marriage of Galton’s geometrical intuition and Pearson’s mathematical analysis. The best men at the wedding were ellipses and higher-dimensional ellipsoids. The brides maids were eigenvectors, pointing in as many different directions as space would allow, each sized according to their associated eigenvalues. Attending the wedding were the ghosts of uncles, Leonhard Euler, Jean-Louis Lagrange, Augustin-Louis Cauchy and others who had earlier discovered the mathematical properties of quadratic forms in relation to problems in physics.
-The key idea in the statistical application was that, for a set of variables \(\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_p\), the \(p \times p\) covariance matrix \(\mathbf{S}\) could be expressed exactly as a matrix product involving a matrix \(\mathbf{V}\), whose columns are eigenvectors (\(\mathbf{v}_i\)) and a diagonal matrix \(\mathbf{\Lambda}\), whose diagonal elements (\(\lambda_i\)) are the corresponding eigenvalues,
+As the ideas of principal components developed, there was a happy marriage of Galton’s geometrical intuition and Pearson’s mathematical analysis. The best men at the wedding were ellipses and higher-dimensional ellipsoids. The brides maids were eigenvectors, pointing in as many different directions as space would allow, each sized according to their associated eigenvalues. Attending the wedding were the ghosts of uncles, Leonhard Euler, Jean-Louis Lagrange, Augustin-Louis Cauchy and others who had earlier discovered the mathematical properties of ellipses and quadratic forms in relation to problems in physics.
+The key idea in the statistical application was that, for a set of variables \(\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_p\), the \(p \times p\) covariance matrix \(\mathbf{S}\) could be expressed exactly as a matrix product involving a matrix \(\mathbf{V}\), whose columns are eigenvectors (\(\mathbf{v}_i\)) and a diagonal matrix \(\mathbf{\Lambda}\), whose diagonal elements (\(\lambda_i\)) are the corresponding eigenvalues.
+To explain this, it is helpful to use a bit of matrix math:
+\[\begin{align*} +\mathbf{S}_{p \times p} & = \mathbf{V}_{p \times p} \quad\quad \mathbf{\Lambda}_{p \times p} \quad\quad \mathbf{V}_{p \times p}^T \\ + & = \left( \mathbf{v}_1, \, \mathbf{v}_2, \,\dots, \, \mathbf{v}_p \right) + \begin{pmatrix} + \lambda_1 & & & \\ + & \lambda_2 & & \\ + & & \ddots & \\ + & & & \lambda_p + \end{pmatrix} + + \begin{pmatrix} + \mathbf{v}_1^T\\ + \mathbf{v}_2^T\\ + \vdots\\ + \mathbf{v}_p^T\\ + \end{pmatrix} + \\ + & = \lambda_1 \mathbf{v}_1 \mathbf{v}_1^T + \lambda_2 \mathbf{v}_2 \mathbf{v}_2^T + \cdots + \lambda_p \mathbf{v}_p \mathbf{v}_p^T +\end{align*}\]
+In this equation,
+-
+
The last line follows because \(\mathbf{\Lambda}\) is a diagonal matrix, so \(\mathbf{S}\) is expressed as a sum of outer products of each \(\mathbf{v}_i\) with itself.
+The columns of \(\mathbf{V}\) are the eigenvectors of \(\mathbf{S}\). They are orthogonal and of unit length, so \(\mathbf{V}^T \mathbf{V} = \mathbf{I}\) and thus they represent orthogonal (uncorrelated) directions in data space.
+The columns \(\mathbf{v}_i\) are the weights applied to the variables to produce the scores on the principal components. For example, the first principal component is the weighted sum:
+
\[ -\begin{align*} -\mathbf{S} & = \mathbf{V} \mathbf{\Lambda} \mathbf{V}^T \\ - & = \lambda_1 \mathbf{v}_1 \mathbf{v}_1^T + \lambda_2 \mathbf{v}_2 \mathbf{v}_2^T + ... + \lambda_p \mathbf{v}_p \mathbf{v}_p^T -\end{align*} -\] In this equation, * The columns of \(\mathbf{V}\) are the eigenvectors and they represent orthogonal (uncorrelated) directions in data space. The values \(\mathbf{v}_i\) are the weights applied to the variables. * The eigenvalues, \(\lambda_i\) …
-For the case of two variables, \(\mathbf{x}_1\) and \(\mathbf{x}_2\) Figure 4.6 shows the transformation …
+PC1 = v_{11} \mathbf{x}_1 + v_{12} \mathbf{x}_2 + \cdots + v_{1p} \mathbf{x}_p +\]The eigenvalues, \(\lambda_1, \lambda_2, \dots, \lambda_p\) are the variances of the the components, because \(\mathbf{v}_i^T \;\mathbf{S} \; \mathbf{v}_i = \lambda_i\).
It is usually the case that the variables \(\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_p\) are linearly independent, which means that none of these is an exact linear combination of the others. In this case, all eigenvalues \(\lambda_i\) are positive and the covariance matrix \(\mathbf{S}\) is said to have rank \(p\).
Here is the key fact: If the eigenvalues are arranged in order, so that \(\lambda_1 > \lambda_2 > \dots > \lambda_p\), then the first \(d\) components give a \(d\)-dimensional approximation to \(\mathbf{S}\), which accounts for \(\sigma_i^d \lambda_i / \sigma_i^p \lambda_i\) of the total variance.
For the case of two variables, \(\mathbf{x}_1\) and \(\mathbf{x}_2\) Figure 4.6 shows the transformation from data space to component space. The eigenvectors, \(\mathbf{v}_1, \mathbf{v}_2\) are the major and minor axes of the data ellipse, whose lengths are the square roots \(\sqrt{\lambda_1}, \sqrt{\lambda_2}\) of the eigenvalues.
4.2.3 Finding principal components
-In R, principal components analysis is most easily carried out using stats::prcomp()
or stats::princomp()
or similar functions in other packages such as FactomineR::PCA()
. The FactoMineR package (Husson et al., 2023) has extensive capabilities for exploratory analysis of multivariate data (PCA, correspondence analysis, cluster analysis, …).
+In R, principal components analysis is most easily carried out using stats::prcomp()
or stats::princomp()
or similar functions in other packages such as FactomineR::PCA()
. The FactoMineR package (Husson et al., 2017, 2023) has extensive capabilities for exploratory analysis of multivariate data (PCA, correspondence analysis, cluster analysis).
+A particular strength of FactoMineR for PCA is that it allows the inclusion of supplementary variables (which can be categorical or quantitative) and supplementary points for individuals. These are not used in the analysis, but are projected into the plots to facilitate interpretation. For example, in the analysis of the crime
data described below, it would be useful to have measures of other characteristics of the U.S. states, such as poverty and average level of education.
Unfortunately, although all of these performing similar calculations, the options for analysis and the details of the result they return differ.
The important options for analysis include:
@@ -575,13 +605,13 @@
crime.pca |> purrr::pluck("rotation")
#> PC1 PC2 PC3 PC4 PC5 PC6 PC7
-#> murder -0.300 -0.6292 0.1782 -0.2321 0.5381 0.2591 0.2676
-#> rape -0.432 -0.1694 -0.2442 0.0622 0.1885 -0.7733 -0.2965
-#> robbery -0.397 0.0422 0.4959 -0.5580 -0.5200 -0.1144 -0.0039
-#> assault -0.397 -0.3435 -0.0695 0.6298 -0.5067 0.1724 0.1917
-#> burglary -0.440 0.2033 -0.2099 -0.0576 0.1010 0.5360 -0.6481
-#> larceny -0.357 0.4023 -0.5392 -0.2349 0.0301 0.0394 0.6017
-#> auto -0.295 0.5024 0.5684 0.4192 0.3698 -0.0573 0.1470
+#> murder -0.300 -0.6292 -0.1782 0.2321 0.5381 0.2591 0.2676
+#> rape -0.432 -0.1694 0.2442 -0.0622 0.1885 -0.7733 -0.2965
+#> robbery -0.397 0.0422 -0.4959 0.5580 -0.5200 -0.1144 -0.0039
+#> assault -0.397 -0.3435 0.0695 -0.6298 -0.5067 0.1724 0.1917
+#> burglary -0.440 0.2033 0.2099 0.0576 0.1010 0.5360 -0.6481
+#> larceny -0.357 0.4023 0.5392 0.2349 0.0301 0.0394 0.6017
+#> auto -0.295 0.5024 -0.5684 -0.4192 0.3698 -0.0573 0.1470
4.2.3 Finding principal components
-In R, principal components analysis is most easily carried out using stats::prcomp()
or stats::princomp()
or similar functions in other packages such as FactomineR::PCA()
. The FactoMineR package (Husson et al., 2023) has extensive capabilities for exploratory analysis of multivariate data (PCA, correspondence analysis, cluster analysis, …).
In R, principal components analysis is most easily carried out using stats::prcomp()
or stats::princomp()
or similar functions in other packages such as FactomineR::PCA()
. The FactoMineR package (Husson et al., 2017, 2023) has extensive capabilities for exploratory analysis of multivariate data (PCA, correspondence analysis, cluster analysis).
A particular strength of FactoMineR for PCA is that it allows the inclusion of supplementary variables (which can be categorical or quantitative) and supplementary points for individuals. These are not used in the analysis, but are projected into the plots to facilitate interpretation. For example, in the analysis of the crime
data described below, it would be useful to have measures of other characteristics of the U.S. states, such as poverty and average level of education.
Unfortunately, although all of these performing similar calculations, the options for analysis and the details of the result they return differ.
The important options for analysis include:
-
@@ -575,13 +605,13 @@
crime.pca |> purrr::pluck("rotation")
#> PC1 PC2 PC3 PC4 PC5 PC6 PC7
-#> murder -0.300 -0.6292 0.1782 -0.2321 0.5381 0.2591 0.2676
-#> rape -0.432 -0.1694 -0.2442 0.0622 0.1885 -0.7733 -0.2965
-#> robbery -0.397 0.0422 0.4959 -0.5580 -0.5200 -0.1144 -0.0039
-#> assault -0.397 -0.3435 -0.0695 0.6298 -0.5067 0.1724 0.1917
-#> burglary -0.440 0.2033 -0.2099 -0.0576 0.1010 0.5360 -0.6481
-#> larceny -0.357 0.4023 -0.5392 -0.2349 0.0301 0.0394 0.6017
-#> auto -0.295 0.5024 0.5684 0.4192 0.3698 -0.0573 0.1470
+#> murder -0.300 -0.6292 -0.1782 0.2321 0.5381 0.2591 0.2676
+#> rape -0.432 -0.1694 0.2442 -0.0622 0.1885 -0.7733 -0.2965
+#> robbery -0.397 0.0422 -0.4959 0.5580 -0.5200 -0.1144 -0.0039
+#> assault -0.397 -0.3435 0.0695 -0.6298 -0.5067 0.1724 0.1917
+#> burglary -0.440 0.2033 0.2099 0.0576 0.1010 0.5360 -0.6481
+#> larceny -0.357 0.4023 0.5392 0.2349 0.0301 0.0394 0.6017
+#> auto -0.295 0.5024 -0.5684 -0.4192 0.3698 -0.0573 0.1470
crime.pca |> purrr::pluck("rotation")
#> PC1 PC2 PC3 PC4 PC5 PC6 PC7
-#> murder -0.300 -0.6292 0.1782 -0.2321 0.5381 0.2591 0.2676
-#> rape -0.432 -0.1694 -0.2442 0.0622 0.1885 -0.7733 -0.2965
-#> robbery -0.397 0.0422 0.4959 -0.5580 -0.5200 -0.1144 -0.0039
-#> assault -0.397 -0.3435 -0.0695 0.6298 -0.5067 0.1724 0.1917
-#> burglary -0.440 0.2033 -0.2099 -0.0576 0.1010 0.5360 -0.6481
-#> larceny -0.357 0.4023 -0.5392 -0.2349 0.0301 0.0394 0.6017
-#> auto -0.295 0.5024 0.5684 0.4192 0.3698 -0.0573 0.1470
But note something important in this output: All of the weights for the first component are negative. In PCA, the directions of the eigenvectors are completely arbitrary, in the sense that the vector \(-\mathbf{v}_i\) gives the same linear combination as \(\mathbf{v}_i\), but with its’ sign reversed. For interpretation, it is useful (and usually recommended) to reflect the loadings to a positive orientation by multiplying them by -1.
To reflect the PCA loadings and get them into a convenient format for plotting with ggplot()
, it is necessary to do a bit of processing, including making the row.names()
into an explicit variable for the purpose of labeling.
But wait, there is something else to be seen in Figure 4.15. Can you see one cell that doesn’t fit with the rest?
Yes, the correlation of number of forward gears (gear
) and number of carburators (carb
) in the upper left and lower right corners is moderately positive (0.27) while all the others in their off-diagonal blocks are negative.
-
-4.5 Elliptical insights: Outlier detection
+
+4.5 Application: Eigenfaces
+
+4.6 Elliptical insights: Outlier detection
The data ellipse (Section 3.1.4), or ellipsoid in more than 2D is fundamental in regression. But also, as Pearson showed, it is key to understanding principal components analysis, where the principal component directions are simply the axes of the ellipsoid of the data. As such, observations that are unusual in data space may not stand out in univariate views of the variables, but will stand out in principal component space, usually on the smallest dimension.
As an illustration, I created a dataset of \(n = 100\) observations with a linear relation, \(y = x + \mathcal{N}(0, 1)\) and then added two discrepant points at (1.5, -1.5), (-1.5, 1.5).
@@ -921,6 +953,9 @@
Husson, F., Josse, J., Le, S., & Mazet, J. (2023). FactoMineR: Multivariate exploratory data analysis and data mining. http://factominer.free.fr
+
+Husson, F., Le, S., & Pagès, J. (2017). Exploratory multivariate analysis by example using r. Chapman & Hall. https://doi.org/10.1201/b21874
+
Pearson, K. (1896). Contributions to the mathematical theory of evolution—III, regression, heredity and panmixia. Philosophical Transactions of the Royal Society of London, 187, 253–318.
diff --git a/docs/90-references.html b/docs/90-references.html
index 390566cc..498ee466 100644
--- a/docs/90-references.html
+++ b/docs/90-references.html
@@ -585,6 +585,10 @@ References
Husson, F., Josse, J., Le, S., & Mazet, J. (2023). FactoMineR:
Multivariate exploratory data analysis and data mining. http://factominer.free.fr
gear
) and number of carburators (carb
) in the upper left and lower right corners is moderately positive (0.27) while all the others in their off-diagonal blocks are negative.-4.5 Elliptical insights: Outlier detection
++4.5 Application: Eigenfaces
++4.6 Elliptical insights: Outlier detection
The data ellipse (Section 3.1.4), or ellipsoid in more than 2D is fundamental in regression. But also, as Pearson showed, it is key to understanding principal components analysis, where the principal component directions are simply the axes of the ellipsoid of the data. As such, observations that are unusual in data space may not stand out in univariate views of the variables, but will stand out in principal component space, usually on the smallest dimension.
As an illustration, I created a dataset of \(n = 100\) observations with a linear relation, \(y = x + \mathcal{N}(0, 1)\) and then added two discrepant points at (1.5, -1.5), (-1.5, 1.5).
Husson, F., Josse, J., Le, S., & Mazet, J. (2023). FactoMineR: Multivariate exploratory data analysis and data mining. http://factominer.free.fr
References
Husson, F., Josse, J., Le, S., & Mazet, J. (2023). FactoMineR: Multivariate exploratory data analysis and data mining. http://factominer.free.fr
library(car)
library(VisCollin)
-library(genridge)
+library(genridge)
library(MASS)
library(dplyr)
library(factoextra)
@@ -616,13 +616,13 @@ #> [1] 2.070 0.911 0.809 0.367 0.245 0.189
#>
#> Rotation (n x k) = (6 x 6):
-#> PC1 PC2 PC3 PC4 PC5 PC6
-#> cylinder -0.454 0.1869 -0.168 0.659 -0.2711 0.4725
-#> engine -0.467 0.1628 -0.134 0.193 -0.0109 -0.8364
-#> horse -0.462 0.0177 0.123 -0.620 -0.6123 0.1067
-#> weight -0.444 0.2598 -0.278 -0.350 0.6860 0.2539
-#> accel 0.330 0.2098 -0.865 -0.143 -0.2774 -0.0337
-#> year 0.237 0.9092 0.335 -0.025 -0.0624 -0.0142
+#> PC1 PC2 PC3 PC4 PC5 PC6
+#> cylinder -0.454 -0.1869 0.168 -0.659 -0.2711 -0.4725
+#> engine -0.467 -0.1628 0.134 -0.193 -0.0109 0.8364
+#> horse -0.462 -0.0177 -0.123 0.620 -0.6123 -0.1067
+#> weight -0.444 -0.2598 0.278 0.350 0.6860 -0.2539
+#> accel 0.330 -0.2098 0.865 0.143 -0.2774 0.0337
+#> year 0.237 -0.9092 -0.335 0.025 -0.0624 0.0142
library(car)
library(VisCollin)
-library(genridge)
+library(genridge)
library(MASS)
library(dplyr)
library(factoextra)
@@ -616,13 +616,13 @@ #> [1] 2.070 0.911 0.809 0.367 0.245 0.189
#>
#> Rotation (n x k) = (6 x 6):
-#> PC1 PC2 PC3 PC4 PC5 PC6
-#> cylinder -0.454 0.1869 -0.168 0.659 -0.2711 0.4725
-#> engine -0.467 0.1628 -0.134 0.193 -0.0109 -0.8364
-#> horse -0.462 0.0177 0.123 -0.620 -0.6123 0.1067
-#> weight -0.444 0.2598 -0.278 -0.350 0.6860 0.2539
-#> accel 0.330 0.2098 -0.865 -0.143 -0.2774 -0.0337
-#> year 0.237 0.9092 0.335 -0.025 -0.0624 -0.0142