Skip to content

Commit

Permalink
edits to ch 3,6,7,8; test \pkg{}
Browse files Browse the repository at this point in the history
  • Loading branch information
friendly committed Oct 27, 2024
1 parent 94e4918 commit 6a3bf58
Show file tree
Hide file tree
Showing 17 changed files with 105 additions and 70 deletions.
1 change: 1 addition & 0 deletions 03-multivariate_plots.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -1988,6 +1988,7 @@ peng |>
```
<!-- cHEATING HERE because ggcpc plots take so long -->
```{r}
#| label: fig-peng-ggpcp1
#| echo: false
Expand Down
Binary file not shown.
Empty file.
Binary file not shown.
2 changes: 2 additions & 0 deletions 06-linear_models-plots.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -630,6 +630,8 @@ and professors.

```{r}
#| label: fig-coffee-spm
#| fig-width: 7
#| fig-height: 6
#| out-width: "100%"
#| fig-cap: "Scatterplot matrix showing pairwise relations among `Heart` ($y$), `Coffee` consumption ($x_1$) and `Stress` ($x_2$), with linear regression lines and 68% data ellipses for the bivariate relations"
data(coffee, package="matlib")
Expand Down
6 changes: 3 additions & 3 deletions 07-lin-mod-topics.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ knitr::include_graphics("images/dual-points-lines.png")

This is illustrated in @fig-dual-points-lines. The left panel shows three lines in data space,
which can be expressed as linear equations in $\mathbf{z} = (x, y)$ of the form
$\mathbf{A} \mathbf{z} = \mathbf{d}$.
$\mathbf{A} \mathbf{z} = \mathbf{d}$. `matlib::showEqn(A, d)` prints these as equations in $x$ and $y$.

```{r}
A <- matrix(c( 1, 1, 0,
Expand Down Expand Up @@ -223,8 +223,8 @@ knitr::include_graphics("images/coffee-data-beta-both.png")
```

Thus, the `r colorize("blue")` ellipse in @fig-coffee-data-beta-both (right) is the
ellipse of **joint** 95% coverage, using the factor $\sqrt{2 F^{.95}_{2, \nu}}$
and covering the true values of ($\beta_{\mathrm{Stress}}, \beta_{\mathrm{Coffee}}$)
ellipse of **joint** 95% coverage, using the factor $\sqrt{2 F^{.95}_{2, \nu}}$,
which covers the true values of ($\beta_{\mathrm{Stress}}, \beta_{\mathrm{Coffee}}$)
in 95% of samples. Moreover:

* Any _joint_ hypothesis (e.g., $\mathcal{H}_0:\beta_{\mathrm{Stress}}=0, \beta_{\mathrm{Coffee}}=0$)
Expand Down
28 changes: 15 additions & 13 deletions 08-collinearity-ridge.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,16 @@ knitr::opts_chunk$set(fig.path = "figs/ch08/")
# Collinearity & Ridge Regression {#sec-collin}


In univariate multiple regression models, we usually hope to have high correlations between the outcome $y$ and each of the
predictors, $\mathbf{X} = [\mathbf{x}_1, \mathbf{x_2}, \dots]$, but high correlations _among_ the predictors can cause problems
in estimating and testing their effects. Exactly the same problems can exist in multivariate response models,
because they involve only the relations among the predictor variables.

> Some of my collinearity diagnostics have large values, or small values, or whatever they are not supposed to be
> * What is bad?
> * If bad, what can I do about it?
In univariate multiple regression models, we usually hope to have high correlations between the outcome $y$ and each of the
predictors, $x_1, x_2, \dots$, but high correlations _among_ the predictors can cause problems
in estimating and testing their effects. The quote above shows the a typical quandary of some researchers in trying
The quote above shows the a typical quandary of some researchers in trying
do understand these problems and and take steps to resolve them.
This chapter illustrates the problems of collinearity,
describes diagnostic measures to asses its effects,
Expand All @@ -43,7 +46,7 @@ library(patchwork)
## What is collinearity?

The chapter quote above is not untypical of researchers who have read standard treatments of linear models
(e.g, @Graybill1961;@Hocking2013)
(e.g, @Graybill1961; @Hocking2013)
and yet are still confused about what collinearity is, how to find its sources and how to correct them.
In @FriendlyKwan:2009, we liken this problem to that of the reader of
Martin Hansford's
Expand Down Expand Up @@ -146,7 +149,7 @@ and various true correlations between $x_1$ and $x_2$, $\rho_{12} = (0, 0.8, 0.9
[^1]: This example is adapted from one by John Fox (2022), [Collinearity Diagnostics](https://socialsciences.mcmaster.ca/jfox/Courses/SORA-TABA/slides-collinearity.pdf)

::: {.column-margin}
Working file: `R/collin-data-beta.R`
R file: `R/collin-data-beta.R`
:::

First, we use `MASS:mvrnorm()` to construct a list of data frames `XY` with specified values
Expand Down Expand Up @@ -450,8 +453,8 @@ to encode other information.
For collinearity diagnostics, these show:
* the condition indices,
using using _squares_ whose background color is red for condition indices > 10,
green for values > 5 and green otherwise, reflecting danger, warning and OK respectively.
using _squares_ whose background color is `r colorize("red")` for condition indices > 10,
`r colorize("brown")` for values > 5 and `r colorize("green")` otherwise, reflecting danger, warning and OK respectively.
The value of the condition index is encoded within this using a white square whose side is proportional to the value
(up to some maximum value, `cond.max` that fills the cell).
Expand All @@ -469,12 +472,11 @@ large variance proportions implicating two or more predictors.
#| label: fig-cars-tableplot
#| fig-keep: "last"
#| out-width: 90%
#| fig.cap: "Tableplot of condition indices and variance proportions for the Cars data. In column 1, the square
#| symbols are scaled relative
#| to a maximum condition index of 30. In the remaining columns, variance
#| proportions (times 100) are shown as circles
#| fig.cap: "Tableplot of condition indices and variance proportions for the Cars data. In column 1, the square symbols are scaled relative to a maximum condition index of 30.
#| In the remaining columns, variance proportions (times 100) are shown as circles
#| scaled relative to a maximum of 100."
tableplot(cd, title = "Tableplot of cars data", cond.max = 30 )
tableplot(cd, title = "Tableplot of cars data",
cond.max = 30 )
```
Expand Down Expand Up @@ -522,7 +524,7 @@ scores for the cars.
#| The projections of the variable vectors on the coordinate axes are proportional to
#| their variance proportions. To reduce graphic clutter, only the most outlying observations in predictor
#| space are identified by case labels.
#| An extreme outlier (case 20) appears in the lower left corner."
#| An extreme outlier (case 20) appears in the lower right corner."
cars.pca$rotation <- -2.5 * cars.pca$rotation # reflect & scale the variable vectors
ggp <- fviz_pca_biplot(
Expand Down
32 changes: 28 additions & 4 deletions R/common.R
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,14 @@ legend_inside <- function(position) {
# Extra stuff
# ------------

# Inline expressions of the form `r Rexpr(expr)` to give "expr = value", e.g.,
# `r Rexpr(cor(x, y))` giving "cor(x, y) = 0.53" (but rounded)

Rexpr = function(expr, digits = 3) {
value <- eval(parse(text=expr)) |> round(digits)
paste(expr, " = ", value)
}

#' colorize text:
# use inline as `r colorize(text, color)` to print `text` in a given `color`
# can also be used to color a color name, as in r colorize("red")`
Expand Down Expand Up @@ -164,12 +172,28 @@ $\\newcommand*{\\diag}[1]{\\ensuremath{\\mathrm{diag}\\, #1}}$
# TODO: add styles (color, font); do it differently for PDF output
# Want to be able to use bold (**...**), italic (_ ... _) or bold-italic (*** ... ***)

pkg <- function(package, cite=FALSE, color="brown") {
pkgname <- if(is.null(color)) package else colorize(package, color)
ref <- paste0("`", package, "`")
# See: Demonstration of how to use other fonts in an Rmarkdown document
# https://gist.github.com/richarddmorey/27e74bcbf28190d150d266ae141f5117

# attributes for displaying the package name
pkgname_font = "bold" # or: plain, ital, boldital
pkgname_face = "mono"
pkgname_color ="brown"

pkg <- function(package, cite=FALSE) {
pkgname <- dplyr::case_when(
pkgname_font == "ital" ~ paste0("_", package, "_"),
pkgname_font == "bold" ~ paste0("**", package, "**"),
pkgname_font == "boldital" ~ paste0("***", package, "***"),
.default = package
)
# pkgname <- if(is.null(color)) package else colorize(package, color)
ref <- pkgname
if (!is.null(pkgname_color)) ref <- colorize(pkgname, pkgname_color)
if (cite) ref <- paste0(ref, " [@R-", package, "]")
if (knitr::is_latex_output()) {
ref <- paste0(ref, "\\index{`",package, "`}\\index{package!`", package, "`}")
ref <- paste0(ref, "\\index{`", package, "`}",
"\\index{package!`", package, "`}")
}

ref
Expand Down
Loading

0 comments on commit 6a3bf58

Please sign in to comment.