Skip to content

Commit

Permalink
fix duncan-av-infl plot
Browse files Browse the repository at this point in the history
  • Loading branch information
friendly committed Jul 20, 2024
1 parent eb62cd1 commit 7fdd1d3
Show file tree
Hide file tree
Showing 7 changed files with 83 additions and 18 deletions.
16 changes: 10 additions & 6 deletions R/Duncan/Duncan-reg.R
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,6 @@ avPlots(duncan.mod,
# conductor wc 76 34 38
# RR.engineer bc 81 28 67

# delete minister, conductor
duncan.mod2 <- update(duncan.mod, subset = - whichNames(c("minister", "conductor"), Duncan))

avPlots(duncan.mod2,
ellipse = list(levels = 0.68, fill = TRUE, fill.alpha = 0.1),
Expand All @@ -84,6 +82,10 @@ avPlots(duncan.mod2,

#' ## show the leverage of the unusual points

# delete minister, conductor
duncan.mod2 <- update(duncan.mod,
subset = - whichNames(c("minister", "conductor"), Duncan))

par(mar = c(4, 5, 4, 1)+.1,
mfrow = c(1,2))
res <- avPlot(duncan.mod, "income",
Expand All @@ -105,8 +107,9 @@ with(info, {
angle = 12, length = .18, lwd = 2, col = "darkgreen")
})

# remove the unusual points
update(fit, data = Duncan[-big, ]) |> abline(col = "red", lwd=2)
# line w/o the unusual points
bs <- coef(duncan.mod2)["income"]
abline(a=0, b=bs, col = "red", lwd=2)


# same for education
Expand All @@ -128,8 +131,9 @@ with(info, {
angle = 12, length = .18, lwd = 2, col = "darkgreen")
})

# remove the unusual points
update(fit, data = Duncan[-big, ]) |> abline(col = "red", lwd=2)
# line w/o the unusual points
bs <- coef(duncan.mod2)["education"]
abline(a=0, b=bs, col = "red", lwd=2)

par(op)

Expand Down
43 changes: 39 additions & 4 deletions child/06-leverage.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -418,7 +418,9 @@ can see directly the how individual cases become individually or jointly influen
The Duncan data provides a particularly instructive example of this. @fig-duncan-av-influence
shows the AV plots for both income and education in the model `duncan.mod`, with some annotations
added. I want to focus here on the _joint_ influence of the occupations minister and conductor
which were seen to be the most influential in @fig-duncan-infl.
which were seen to be the most influential in @fig-duncan-infl. The `r colorize("green")`
vertical lines show their residuals in each panel and the `r colorize("red")` show the regression
lines when these two observations are deleted.

The basic AV plots are produced using the call to `avPlots()` below.
To avoid clutter, I use the argument `id = list(method = "mahal", n=3)` so that
Expand All @@ -438,11 +440,44 @@ avPlots(duncan.mod,

```{r}
#| label: fig-duncan-av-influence
#| echo: false
#| out-width: "100%"
#| fig-cap: Added variable plots for the Duncan model, highlighting the impact of the observations for min ister and conductor in each plot. The red line in each panel shows the partial regression line omitting these observations.
#| fig-cap: Added variable plots for the Duncan model, highlighting the impact of the observations for min ister and conductor in each plot. The red line in each panel shows the regression line omitting these observations.
knitr::include_graphics("images/duncan-av-influence.png")
```

The two cases---minister and conductor--- are the most highly influential, and ...
The two cases---minister and conductor--- are the most highly influential, but as we can see in
@fig-duncan-av-influence their influence combines because they are at opposite sides of the horizontal axis and their residuals are of opposite signs They act together to decrease the slope for income and increase that for education.

```{r}
#| label: duncan-av-infl-code
#| eval: false
#| code-fold: true
#| code-summary: Code for income AV plot
res <- avPlot(duncan.mod, "income",
ellipse = list(levels = 0.68),
id = list(method = "mahal", n=3),
pch = 16,
cex.lab = 1.5) |>
as.data.frame()
fit <- lm(prestige ~ income, data = res)
info <- cbind(res, fitted = fitted(fit),
resids = residuals(fit),
hat = hatvalues(fit),
cookd = cooks.distance(fit))
# noteworthy points in this plot
big <- which(info$cookd > .20)
with(info, {
arrows(income[big], fitted[big], income[big], prestige[big],
angle = 12, length = .18, lwd = 2, col = "darkgreen")
})
# line w/o the unusual points
duncan.mod2 <- update(duncan.mod,
subset = - whichNames(c("minister", "conductor"), Duncan))
bs <- coef(duncan.mod2)["income"]
abline(a=0, b=bs, col = "red", lwd=2)
```

**TODO**: The regression lines omitting these cases aren't correct. Fix this.
36 changes: 31 additions & 5 deletions docs/06-linear_models-plots.html
Original file line number Diff line number Diff line change
Expand Up @@ -2081,7 +2081,7 @@ <h1 class="title"><span id="sec-linear-models-plots" class="quarto-section-ident
</ul></section><section id="influence-in-added-variable-plots" class="level3" data-number="6.6.4"><h3 data-number="6.6.4" class="anchored" data-anchor-id="influence-in-added-variable-plots">
<span class="header-section-number">6.6.4</span> Influence in added-variable plots</h3>
<p>The properties of added-variable plots discussed in <a href="#sec-avplots" class="quarto-xref"><span>Section 6.4</span></a> make them also useful for understanding why cases are influential because they control for other predictors in each plot, and therefore show the <em>partial</em> contributions of each observation to hat values and residuals. As a consequence, we can see directly the how individual cases become individually or jointly influential.</p>
<p>The Duncan data provides a particularly instructive example of this. <a href="#fig-duncan-av-influence" class="quarto-xref">Figure&nbsp;<span>6.27</span></a> shows the AV plots for both income and education in the model <code>duncan.mod</code>, with some annotations added. I want to focus here on the <em>joint</em> influence of the occupations minister and conductor which were seen to be the most influential in <a href="#fig-duncan-infl" class="quarto-xref">Figure&nbsp;<span>6.26</span></a>.</p>
<p>The Duncan data provides a particularly instructive example of this. <a href="#fig-duncan-av-influence" class="quarto-xref">Figure&nbsp;<span>6.27</span></a> shows the AV plots for both income and education in the model <code>duncan.mod</code>, with some annotations added. I want to focus here on the <em>joint</em> influence of the occupations minister and conductor which were seen to be the most influential in <a href="#fig-duncan-infl" class="quarto-xref">Figure&nbsp;<span>6.26</span></a>. The <span style="color: green;">green</span> vertical lines show their residuals in each panel and the <span style="color: red;">red</span> show the regression lines when these two observations are deleted.</p>
<p>The basic AV plots are produced using the call to <code><a href="https://rdrr.io/pkg/car/man/avPlots.html">avPlots()</a></code> below. To avoid clutter, I use the argument <code>id = list(method = "mahal", n=3)</code> so that only the three points with the greatest Mahalanobis distances from the centroid in each plot are labeled. These are the cases with the largest leverage seen in <a href="#fig-duncan-infl" class="quarto-xref">Figure&nbsp;<span>6.26</span></a>.</p>
<div class="cell" data-layout-align="center">
<div class="sourceCode" id="cb57" data-source-line-numbers="nil" data-code-line-numbers="nil"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/pkg/car/man/avPlots.html">avPlots</a></span><span class="op">(</span><span class="va">duncan.mod</span>,</span>
Expand All @@ -2091,20 +2091,46 @@ <h1 class="title"><span id="sec-linear-models-plots" class="quarto-section-ident
<span> cex.lab <span class="op">=</span> <span class="fl">1.5</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>
<div class="cell" data-layout-align="center">
<div class="sourceCode" id="cb58" data-source-line-numbers="nil" data-code-line-numbers="nil"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="fu">knitr</span><span class="fu">::</span><span class="fu"><a href="https://rdrr.io/pkg/knitr/man/include_graphics.html">include_graphics</a></span><span class="op">(</span><span class="st">"images/duncan-av-influence.png"</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output-display">
<div id="fig-duncan-av-influence" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center">
<figure class="quarto-float quarto-float-fig figure"><div aria-describedby="fig-duncan-av-influence-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="images/duncan-av-influence.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:100.0%">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-duncan-av-influence-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;6.27: Added variable plots for the Duncan model, highlighting the impact of the observations for min ister and conductor in each plot. The red line in each panel shows the partial regression line omitting these observations.
Figure&nbsp;6.27: Added variable plots for the Duncan model, highlighting the impact of the observations for min ister and conductor in each plot. The red line in each panel shows the regression line omitting these observations.
</figcaption></figure>
</div>
</div>
</div>
<p>The two cases—minister and conductor— are the most highly influential, and …</p>
<p><strong>TODO</strong>: The regression lines omitting these cases aren’t correct. Fix this.</p>
<p>The two cases—minister and conductor— are the most highly influential, but as we can see in <a href="#fig-duncan-av-influence" class="quarto-xref">Figure&nbsp;<span>6.27</span></a> their influence combines because they are at opposite sides of the horizontal axis and their residuals are of opposite signs They act together to decrease the slope for income and increase that for education.</p>
<div class="cell" data-layout-align="center">
<details class="code-fold"><summary>Code for income AV plot</summary><div class="sourceCode" id="cb58" data-source-line-numbers="nil" data-code-line-numbers="nil"><pre class="downlit sourceCode r code-with-copy"><code class="sourceCode R"><span><span class="va">res</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/pkg/car/man/avPlots.html">avPlot</a></span><span class="op">(</span><span class="va">duncan.mod</span>, <span class="st">"income"</span>,</span>
<span> ellipse <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/list.html">list</a></span><span class="op">(</span>levels <span class="op">=</span> <span class="fl">0.68</span><span class="op">)</span>,</span>
<span> id <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/list.html">list</a></span><span class="op">(</span>method <span class="op">=</span> <span class="st">"mahal"</span>, n<span class="op">=</span><span class="fl">3</span><span class="op">)</span>,</span>
<span> pch <span class="op">=</span> <span class="fl">16</span>,</span>
<span> cex.lab <span class="op">=</span> <span class="fl">1.5</span><span class="op">)</span> <span class="op">|&gt;</span></span>
<span> <span class="fu"><a href="https://rdrr.io/r/base/as.data.frame.html">as.data.frame</a></span><span class="op">(</span><span class="op">)</span></span>
<span><span class="va">fit</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/lm.html">lm</a></span><span class="op">(</span><span class="va">prestige</span> <span class="op">~</span> <span class="va">income</span>, data <span class="op">=</span> <span class="va">res</span><span class="op">)</span></span>
<span><span class="va">info</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/cbind.html">cbind</a></span><span class="op">(</span><span class="va">res</span>, fitted <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/stats/fitted.values.html">fitted</a></span><span class="op">(</span><span class="va">fit</span><span class="op">)</span>, </span>
<span> resids <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/stats/residuals.html">residuals</a></span><span class="op">(</span><span class="va">fit</span><span class="op">)</span>,</span>
<span> hat <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/stats/influence.measures.html">hatvalues</a></span><span class="op">(</span><span class="va">fit</span><span class="op">)</span>,</span>
<span> cookd <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/stats/influence.measures.html">cooks.distance</a></span><span class="op">(</span><span class="va">fit</span><span class="op">)</span><span class="op">)</span></span>
<span></span>
<span><span class="co"># noteworthy points in this plot</span></span>
<span><span class="va">big</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/base/which.html">which</a></span><span class="op">(</span><span class="va">info</span><span class="op">$</span><span class="va">cookd</span> <span class="op">&gt;</span> <span class="fl">.20</span><span class="op">)</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/base/with.html">with</a></span><span class="op">(</span><span class="va">info</span>, <span class="op">{</span></span>
<span> <span class="fu"><a href="https://rdrr.io/r/graphics/arrows.html">arrows</a></span><span class="op">(</span><span class="va">income</span><span class="op">[</span><span class="va">big</span><span class="op">]</span>, <span class="va">fitted</span><span class="op">[</span><span class="va">big</span><span class="op">]</span>, <span class="va">income</span><span class="op">[</span><span class="va">big</span><span class="op">]</span>, <span class="va">prestige</span><span class="op">[</span><span class="va">big</span><span class="op">]</span>, </span>
<span> angle <span class="op">=</span> <span class="fl">12</span>, length <span class="op">=</span> <span class="fl">.18</span>, lwd <span class="op">=</span> <span class="fl">2</span>, col <span class="op">=</span> <span class="st">"darkgreen"</span><span class="op">)</span></span>
<span> <span class="op">}</span><span class="op">)</span></span>
<span></span>
<span><span class="co"># line w/o the unusual points</span></span>
<span><span class="va">duncan.mod2</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/update.html">update</a></span><span class="op">(</span><span class="va">duncan.mod</span>, </span>
<span> subset <span class="op">=</span> <span class="op">-</span> <span class="fu"><a href="https://rdrr.io/pkg/car/man/which.names.html">whichNames</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="st">"minister"</span>, <span class="st">"conductor"</span><span class="op">)</span>, <span class="va">Duncan</span><span class="op">)</span><span class="op">)</span></span>
<span></span>
<span><span class="va">bs</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/coef.html">coef</a></span><span class="op">(</span><span class="va">duncan.mod2</span><span class="op">)</span><span class="op">[</span><span class="st">"income"</span><span class="op">]</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/graphics/abline.html">abline</a></span><span class="op">(</span>a<span class="op">=</span><span class="fl">0</span>, b<span class="op">=</span><span class="va">bs</span>, col <span class="op">=</span> <span class="st">"red"</span>, lwd<span class="op">=</span><span class="fl">2</span><span class="op">)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</details>
</div>
<p><strong>Package summary</strong></p>
<div class="cell" data-layout-align="center">
<pre data-code-line-numbers=""><code>#&gt; Writing packages to C:/R/Projects/Vis-MLM-book/bib/pkgs.txt
Expand Down
Binary file modified docs/images/duncan-av-influence.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<meta name="generator" content="quarto-1.5.53">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<meta name="author" content="Michael Friendly">
<meta name="dcterms.date" content="2024-07-19">
<meta name="dcterms.date" content="2024-07-20">
<title>Visualizing Multivariate Data and Models in R</title>
<style>
code{white-space: pre-wrap;}
Expand Down Expand Up @@ -179,7 +179,7 @@ <h1 class="title">Visualizing Multivariate Data and Models in R</h1>
<div>
<div class="quarto-title-meta-heading">Published</div>
<div class="quarto-title-meta-contents">
<p class="date">July 19, 2024</p>
<p class="date">July 20, 2024</p>
</div>
</div>

Expand Down
2 changes: 1 addition & 1 deletion docs/search.json

Large diffs are not rendered by default.

Binary file modified images/duncan-av-influence.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 7fdd1d3

Please sign in to comment.