edits to ch 3,6,7,8; test \pkg{}

friendly · Oct 27, 2024 · 6a3bf58 · 6a3bf58
1 parent 94e4918
commit 6a3bf58
Show file tree

Hide file tree

Showing 17 changed files with 105 additions and 70 deletions.
diff --git a/03-multivariate_plots.qmd b/03-multivariate_plots.qmd
@@ -1988,6 +1988,7 @@ peng |>
 ```
 
 <!-- cHEATING HERE because ggcpc plots take so long -->
+
 ```{r}
 #| label: fig-peng-ggpcp1
 #| echo: false

diff --git a/03-multivariate_plots_cache/html/fig-peng-ggpcp1-code_4ec619b19658fcdfd14143fad743db48.RData b/03-multivariate_plots_cache/html/fig-peng-ggpcp1-code_4ec619b19658fcdfd14143fad743db48.RData
diff --git a/03-multivariate_plots_cache/html/fig-peng-ggpcp1-code_4ec619b19658fcdfd14143fad743db48.rdb b/03-multivariate_plots_cache/html/fig-peng-ggpcp1-code_4ec619b19658fcdfd14143fad743db48.rdb
diff --git a/03-multivariate_plots_cache/html/fig-peng-ggpcp1-code_4ec619b19658fcdfd14143fad743db48.rdx b/03-multivariate_plots_cache/html/fig-peng-ggpcp1-code_4ec619b19658fcdfd14143fad743db48.rdx
diff --git a/06-linear_models-plots.qmd b/06-linear_models-plots.qmd
@@ -630,6 +630,8 @@ and professors.
 
 ```{r}
 #| label: fig-coffee-spm
+#| fig-width: 7
+#| fig-height: 6
 #| out-width: "100%"
 #| fig-cap: "Scatterplot matrix showing pairwise relations among `Heart` ($y$), `Coffee` consumption ($x_1$) and `Stress` ($x_2$), with linear regression lines and 68% data ellipses for the bivariate relations"
 data(coffee, package="matlib")

diff --git a/07-lin-mod-topics.qmd b/07-lin-mod-topics.qmd
@@ -68,7 +68,7 @@ knitr::include_graphics("images/dual-points-lines.png")
 
 This is illustrated in @fig-dual-points-lines. The left panel shows three lines in data space,
 which can be expressed as linear equations in $\mathbf{z} = (x, y)$ of the form 
-$\mathbf{A} \mathbf{z} = \mathbf{d}$. 
+$\mathbf{A} \mathbf{z} = \mathbf{d}$. `matlib::showEqn(A, d)` prints these as equations in $x$ and $y$.
 
 ```{r}
 A <- matrix(c( 1, 1, 0,
@@ -223,8 +223,8 @@ knitr::include_graphics("images/coffee-data-beta-both.png")
 ```
 
 Thus, the `r colorize("blue")` ellipse in @fig-coffee-data-beta-both (right) is the
-ellipse of **joint** 95% coverage, using the factor $\sqrt{2 F^{.95}_{2, \nu}}$
-and covering the true values of ($\beta_{\mathrm{Stress}}, \beta_{\mathrm{Coffee}}$)
+ellipse of **joint** 95% coverage, using the factor $\sqrt{2 F^{.95}_{2, \nu}}$,
+which covers the true values of ($\beta_{\mathrm{Stress}}, \beta_{\mathrm{Coffee}}$)
 in 95% of samples.  Moreover:
 
 *  Any _joint_ hypothesis (e.g., $\mathcal{H}_0:\beta_{\mathrm{Stress}}=0, \beta_{\mathrm{Coffee}}=0$)

diff --git a/08-collinearity-ridge.qmd b/08-collinearity-ridge.qmd
@@ -10,13 +10,16 @@ knitr::opts_chunk$set(fig.path = "figs/ch08/")
 # Collinearity & Ridge Regression {#sec-collin}
 
 
+In univariate multiple regression models, we usually hope to have high correlations between the outcome $y$ and each of the
+predictors, $\mathbf{X} = [\mathbf{x}_1, \mathbf{x_2}, \dots]$, but high correlations _among_ the predictors can cause problems
+in estimating and testing their effects. Exactly the same problems can exist in multivariate response models,
+because they involve only the relations among the predictor variables.
+
 > Some of my collinearity diagnostics have large values, or small values, or whatever they are not supposed to be
 > * What is bad?
 > * If bad, what can I do about it?
 
-In univariate multiple regression models, we usually hope to have high correlations between the outcome $y$ and each of the
-predictors, $x_1, x_2, \dots$, but high correlations _among_ the predictors can cause problems
-in estimating and testing their effects. The quote above shows the a typical quandary of some researchers in trying
+The quote above shows the a typical quandary of some researchers in trying
 do understand these problems and and take steps to resolve them.
 This chapter illustrates the problems of collinearity,
 describes diagnostic measures to asses its effects, 
@@ -43,7 +46,7 @@ library(patchwork)
 ## What is collinearity?
 
 The chapter quote above is not untypical of researchers who have read standard treatments of linear models
-(e.g, @Graybill1961;@Hocking2013)
+(e.g, @Graybill1961; @Hocking2013)
 and yet are still confused about what collinearity is, how to find its sources and how to correct them.
 In @FriendlyKwan:2009, we liken this problem to that of the reader of 
 Martin Hansford's
@@ -146,7 +149,7 @@ and various true correlations between $x_1$ and $x_2$, $\rho_{12} = (0, 0.8, 0.9
 [^1]: This example is adapted from one by John Fox (2022), [Collinearity Diagnostics](https://socialsciences.mcmaster.ca/jfox/Courses/SORA-TABA/slides-collinearity.pdf)
 
 ::: {.column-margin}
-Working file: `R/collin-data-beta.R`
+R file: `R/collin-data-beta.R`
 :::
 
 First, we use `MASS:mvrnorm()` to construct a list of data frames `XY` with specified values
@@ -450,8 +453,8 @@ to encode other information.
 For collinearity diagnostics, these show: 
 
 * the condition indices,
-using using _squares_ whose background color is red for condition indices > 10,
-green for values > 5 and green otherwise, reflecting danger, warning and OK respectively.
+using _squares_ whose background color is `r colorize("red")` for condition indices > 10,
+`r colorize("brown")` for values > 5 and `r colorize("green")` otherwise, reflecting danger, warning and OK respectively.
 The value of the condition index is encoded within this using a white square whose side is proportional to the value
 (up to some maximum value, `cond.max` that fills the cell).
 
@@ -469,12 +472,11 @@ large variance proportions implicating two or more predictors.
 #| label: fig-cars-tableplot
 #| fig-keep: "last"
 #| out-width: 90%
-#| fig.cap: "Tableplot of condition indices and variance proportions for the Cars data. In column 1, the square   
-#|     symbols are scaled relative
-#|     to a maximum condition index of 30. In the remaining columns, variance
-#|     proportions (times 100) are shown as circles
+#| fig.cap: "Tableplot of condition indices and variance proportions for the Cars data. In column 1, the square symbols are scaled relative to a maximum condition index of 30. 
+#|     In the remaining columns, variance proportions (times 100) are shown as circles
 #|     scaled relative to a maximum of 100."
-tableplot(cd, title = "Tableplot of cars data", cond.max = 30 )
+tableplot(cd, title = "Tableplot of cars data", 
+          cond.max = 30 )
 ```
 
 
@@ -522,7 +524,7 @@ scores for the cars.
 #|  The projections of the variable vectors on the coordinate axes are proportional to
 #|  their variance proportions. To reduce graphic clutter, only the most outlying observations in predictor
 #|  space are identified by case labels.
-#|  An extreme outlier (case 20) appears in the lower left corner."
+#|  An extreme outlier (case 20) appears in the lower right corner."
 cars.pca$rotation <- -2.5 * cars.pca$rotation    # reflect & scale the variable vectors
 
 ggp <- fviz_pca_biplot(

diff --git a/R/common.R b/R/common.R
@@ -98,6 +98,14 @@ legend_inside <- function(position) {
 # Extra stuff
 # ------------
 
+# Inline expressions of the form `r Rexpr(expr)` to give "expr = value", e.g., 
+# `r Rexpr(cor(x, y))` giving "cor(x, y) = 0.53" (but rounded)
+
+Rexpr = function(expr, digits = 3) {
+  value <- eval(parse(text=expr)) |> round(digits)
+  paste(expr, " = ", value)
+}
+
 #' colorize text: 
 # use inline as `r colorize(text, color)` to print `text` in a given `color`
 # can also be used to color a color name, as in r colorize("red")`
@@ -164,12 +172,28 @@ $\\newcommand*{\\diag}[1]{\\ensuremath{\\mathrm{diag}\\, #1}}$
 # TODO: add styles (color, font); do it differently for PDF output
 #   Want to be able to use bold (**...**), italic (_ ... _) or bold-italic (*** ... ***)
 
-pkg <- function(package, cite=FALSE, color="brown") {
-  pkgname <- if(is.null(color)) package else colorize(package, color)
-  ref <- paste0("`", package, "`")
+# See: Demonstration of how to use other fonts in an Rmarkdown document 
+#      https://gist.github.com/richarddmorey/27e74bcbf28190d150d266ae141f5117
+
+# attributes for displaying the package name
+pkgname_font = "bold" # or: plain, ital, boldital
+pkgname_face = "mono"
+pkgname_color ="brown"
+
+pkg <- function(package, cite=FALSE) {
+  pkgname <- dplyr::case_when(
+    pkgname_font == "ital"      ~ paste0("_", package, "_"),
+    pkgname_font == "bold"      ~ paste0("**", package, "**"),
+    pkgname_font == "boldital"  ~ paste0("***", package, "***"),
+    .default = package
+  )
+#  pkgname <- if(is.null(color)) package else colorize(package, color)
+  ref <- pkgname
+  if (!is.null(pkgname_color)) ref <- colorize(pkgname, pkgname_color)
   if (cite) ref <- paste0(ref, " [@R-", package, "]")
   if (knitr::is_latex_output()) {
-    ref <- paste0(ref, "\\index{`",package, "`}\\index{package!`", package, "`}")
+    ref <- paste0(ref, "\\index{`", package, "`}",
+                       "\\index{package!`", package, "`}")
   }
 
   ref
-Original file line number
+Diff line change
@@ Expand Up / @@ -1988,6 +1988,7 @@ peng |> @@
     ```
     <!-- cHEATING HERE because ggcpc plots take so long -->
     ```{r}
     #| label: fig-peng-ggpcp1
     #| echo: false
@@ Expand Down @@