Skip to content

Commit

Permalink
Ch03 variable ordering in crime data
Browse files Browse the repository at this point in the history
  • Loading branch information
friendly committed Oct 25, 2024
1 parent 097e2d5 commit 6386ae1
Show file tree
Hide file tree
Showing 10 changed files with 557 additions and 209 deletions.
94 changes: 72 additions & 22 deletions 03-multivariate_plots.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -1487,7 +1487,7 @@ the nonparametric smooth doesn't differ much from the linear regression
line. Exceptions to this appear mainly in the columns for `robbery` and
`auto` (auto theft).
### Corrgrams {#sec-corrgram}
## Corrgrams {#sec-corrgram}
What if you want to summarize the data even further, for example to show
only the value of the correlation for each pair of variables? A
Expand All @@ -1502,7 +1502,9 @@ ordering the levels of factors and variables in graphic displays to make
important features most apparent. For variables, this means that we can
arrange the variables in a matrix-like display in such a way as to make
the pattern of relationships easiest to see. Methods to achieve this
include using principal components and cluster analysis to put the most
include using principal components
and cluster analysis
to put the most
related variables together as described in @sec-pca-biplot.
```{r}
Expand All @@ -1521,38 +1523,42 @@ panels as illustrated in @fig-corrgram-renderings. For example, a
corrgram similar to @fig-crime-spm can be produced as follows (not shown
here):
```{r}
```{r corrgram}
#| eval: false
crime |>
select(where(is.numeric)) |>
corrgram(lower.panel = panel.ellipse,
upper.panel = panel.ellipse,
diag.panel = panel.density)
crime.cor <- crime |>
dplyr::select(where(is.numeric)) |>
cor()
corrgram(crime.cor,
lower.panel = panel.ellipse,
upper.panel = panel.ellipse,
diag.panel = panel.density)
```
`corrplot::corrplot()` provides the rendering methods
With the **corrplot** package, `corrplot()` provides the rendering methods
`c("circle", "square", "ellipse", "number", "shade", "color", "pie")`,
but only one can be used at a time. The function
`corrplot::corrplot.mixed()` allows different options to be selected for
the lower and upper triangles. The iconic shape is colored with a
`corrplot.mixed()` allows different options to be selected for
the lower and upper triangles. The iconic rendering shape is colored with a
gradient in relation to the correlation value.
For comparison, @fig-crime-corrplot uses ellipses below the diagonal and
filled pie charts below the diagonal using a gradient of the fill color
in both cases.
```{r}
#| label: fig-crime-corrplot
#| fig-width: 8
#| fig-height: 8
#| out-width: "100%"
#| fig-cap: "Corrplot of the `crime` data, showing the correlation between each pair of variables with an ellipse (lower) and a pie chart symbol (upper), all shaded in proportion to the correlation value, also shown numerically."
crime |>
select(where(is.numeric)) |>
cor() |>
corrplot.mixed(
lower = "ellipse",
upper = "pie",
tl.col = "black",
tl.srt = 0,
addCoef.col = "black",
addCoefasPercent = TRUE)
#| fig-cap: "Mixed corrplot of the `crime` data, showing the correlation between each pair of variables with an ellipse (lower) and a pie chart symbol (upper), all shaded in proportion to the correlation value, also shown numerically."
corrplot.mixed(crime.cor,
lower = "ellipse",
upper = "pie",
tl.col = "black",
tl.srt = 0,
tl.cex = 1.25,
addCoef.col = "black",
addCoefasPercent = TRUE)
```
The combination of renderings shown in @fig-crime-corrplot is
Expand All @@ -1562,6 +1568,50 @@ the values for murder with larceny and auto theft in row 1, columns 6-7
with those in column 1, rows 6-7, where the former are easier to
distinguish. The shading color adds another visual cue.
The variables in @fig-crime-spm and @fig-crime-corrplot are arranged
by their order in the dataset, which is not often the most useful.
A better idea is to arrange the variables so that the most highly
correlated variables are adjacent.
A general method described in @sec-var-order orders the variables according to the
angles of the eigenvectors from a principal components analysis
around a unit circle. `corrMatOrder()` provides several methods
(`order = c("AOE", "FPC", "hclust", "alphabet")`),
of which this is
`order = "AOE"`. Using this in `corrplot()` produces @fig-crime-corrplot-AOE.
```{r corMatOrder}
ord <- corrMatOrder(crime.cor, order = "AOE")
rownames(crime.cor)[ord]
```
<!-- other orders: FPC (PC1), hclust (with various clustering methods), alphabet, original -->
<!-- calculated with corrMatOrder() -->
```{r}
#| label: fig-crime-corrplot-AOE
#| fig-width: 8
#| fig-height: 8
#| out-width: "100%"
#| fig-cap: "Corrplot of the `crime` data with the variables reordered according to the angles of variable eigenvectors."
corrplot.mixed(crime.cor,
order = "AOE",
lower = "ellipse",
upper = "ellipse",
tl.col = "black",
tl.srt = 0,
tl.cex = 1.25,
addCoef.col = "black",
addCoefasPercent = TRUE)
```
In this case, where the correlations among the crime variables are all positive, the effect of
variable re-ordering is subtle, but ...
@fig-mtcars-corrplot-varorder and @fig-mtcars-corrplot-pcaorder provide a more dramatic example.
Variations of corrgrams are worthy replacements for a numeric table of
correlations, which are often presented in publications only for
archival value. Including the numeric value (rounded here, for
Expand Down
81 changes: 54 additions & 27 deletions R/crime/crime-correlation.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,37 +8,64 @@ library(dplyr)

data(crime, package="ggbiplot")

crime.cor <- crime |>
dplyr::select(where(is.numeric)) |>
cor()

# similar to Fig 3.24
crime |>
select(where(is.numeric)) |>
corrgram(lower.panel = panel.ellipse,
upper.panel = panel.ellipse,
diag.panel = panel.density)
corrgram(crime.cor,
lower.panel = panel.ellipse,
upper.panel = panel.ellipse,
diag.panel = panel.density)

# show representation of ellipse and correlation value
crime |>
select(where(is.numeric)) |>
cor() |>
corrplot(diag = FALSE,
method = "ellipse",
tl.col = "black",
tl.srt = 0,
addCoef.col = "black",
addCoefasPercent = TRUE)

crime |>
select(where(is.numeric)) |>
cor() |>
corrplot.mixed(
lower = "ellipse",
upper = "pie",
tl.col = "black",
tl.srt = 0,
tl.cex = 1.3,
addCoef.col = "black",
addCoefasPercent = TRUE)


corrplot(crime.cor,
diag = FALSE,
method = "ellipse",
tl.col = "black",
tl.srt = 0,
addCoef.col = "black",
addCoefasPercent = TRUE)

# use correlation ordering ("AOE")
corrplot(crime.cor,
diag = FALSE,
order = "AOE",
method = "ellipse",
tl.col = "black",
tl.srt = 0,
addCoef.col = "black",
addCoefasPercent = TRUE)


corrplot.mixed(crime.cor,
lower = "ellipse",
upper = "pie",
tl.col = "black",
tl.srt = 0,
tl.cex = 1.25,
addCoef.col = "black",
addCoefasPercent = TRUE)

corrplot.mixed(crime.cor,
order = "AOE", #"FPC",
lower = "ellipse",
upper = "ellipse",
tl.col = "black",
tl.srt = 0,
tl.cex = 1.25,
addCoef.col = "black",
addCoefasPercent = TRUE)

ord <- corrMatOrder(crime.cor, order = "AOE")
rownames(crime.cor)[ord]

library(seriation)

ord <- seriate(crime.cor, method = "PCA_angle")
# what's the order ?
permute(crime.cor, ord) |> rownames()



Expand Down
41 changes: 23 additions & 18 deletions R/crime/crime-network.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,73 +8,78 @@ library(qgraph)
#'
data(crime, package = "ggbiplot")

corrmat <- crime |>
crime.cor <- crime |>
dplyr::select(where(is.numeric)) |>
cor()

# ### "association graph": network of correlations
qgraph(corrmat,
qgraph(crime.cor,
title = "Crime data, correlations", title.cex = 1.25,
graph = "cor",
minimum = "sig", sampleSize = nrow(crime),
minimum = "sig", sampleSize = nrow(crime), alpha = 0.01,
color = grey(.9), vsize = 12,
labels = rownames(corrmat),
labels = rownames(crime.cor),
posCol = "blue")


# ### "concentration graph": network of partial correlations
# Correlations between variables that cannot be explained by other variables in the network

qgraph(corrmat,
qgraph(crime.cor,
title = "Crime data, partial correlations", title.cex = 1.25,
graph = "pcor",
minimum = "sig", sampleSize = nrow(crime),
minimum = "sig", sampleSize = nrow(crime), alpha = 0.01,
color = grey(.9), vsize = 14,
labels = rownames(corrmat),
labels = rownames(crime.cor),
edge.labels = TRUE, edge.label.cex = 1.7,
posCol = "blue")


#' ### variable ordering: reorder variables by PC1 & PC2 angles
library(seriation)

ord <- seriate(corrmat, method = "PCA_angle")
ord <- seriate(crime.cor, method = "PCA_angle")
# what's the order ?
permute(corrmat, ord) |> rownames()
permute(crime.cor, ord) |> rownames()

qgraph(permute(corrmat, ord),
qgraph(permute(crime.cor, ord),
title = "Crime data, correlations", title.cex = 1.25,
graph = "cor",
minimum = "sig", sampleSize = nrow(crime),
minimum = "sig", sampleSize = nrow(crime), alpha = 0.01,
color = grey(.9), vsize = 12,
labels = rownames(permute(corrmat, ord)),
labels = rownames(permute(crime.cor, ord)),
edge.labels = TRUE, edge.label.cex = 1.3,
posCol = "blue")

#' to understand the partial correlations, make scatterplots of the residuals from the
#' models where each x_i, x_j are predicted by all others. I've never seen such a plot,
#' but could be done by modifying AVplot
#'




#' `mtcars` data
#' ## `mtcars` data
#' Try the same things for the mtcars data
data(mtcars)

corrmat <- cor(mtcars)
cars.cor <- cor(mtcars)

qgraph(corrmat,
qgraph(cars.cor,
graph = "cor",
minimum = "sig", sampleSize = nrow(mtcars),
color = grey(.9), vsize = 12,
labels = names(mtcars),
# edge.labels = TRUE, edge.label.cex = 1.3,
posCol = "blue")

qgraph(corrmat,
qgraph(cars.cor,
graph = "pcor",
minimum = "sig", sampleSize = nrow(mtcars),
color = grey(.9), vsize = 12,
labels = names(mtcars),
edge.labels = TRUE, edge.label.cex = 1.3,
posCol = "blue")

# same, for crime data



Expand Down
52 changes: 52 additions & 0 deletions R/pvPlot.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Partial variables plot

# To understand the partial correlations, make scatterplots of the residuals from the
# models where each x_i, x_j are predicted by all others.

# see also: pairs version
# https://stackoverflow.com/questions/35591033/plot-scatterplot-matrix-with-partial-correlation-coefficients-in-r

pvPlot <- function(X, vars = 1:2,
col = "black",
pch = 16,
cex = par("cex"),
axes = TRUE,
...) {
nv <- ncol(X)
nr <- nrow(X)
v1 <- vars[1]
v2 <- vars[2]
all <- if(is.numeric(vars)) seq_along(nv) else names(X)
others <- setdiff(all, vars)
res <- X[, vars]
res[, 1] <- lsfit(X[, others], X[, v1])$residuals
res[, 2] <- lsfit(X[, others], X[, v2])$residuals
plot(res,
col = col, pch = pch, cex = cex, ...)
if (axes)
abline(h = 0, v = 0, col = "gray")
invisible(res)
}

if(FALSE) {
data(crime, package = "ggbiplot")
res <- crime |>
tibble::column_to_rownames("st") |>
dplyr::select(where(is.numeric)) |>
pvPlot(vars = c("burglary", "larceny"))

head(res)
car::scatterplot(larceny ~ burglary, data = res,
xlab = "burglary residual",
ylab = "larceny residual",
pch = 16, col = "black",
smooth = FALSE, boxplots = FALSE,
grid = FALSE,
id = list(n=5))
abline(h = 0, v = 0, col = "gray")
text(-600, 1300,
label = paste("partial r =",
round(cor(res[,1], res[,2]), 3)),
pos = 4)

}
Loading

0 comments on commit 6386ae1

Please sign in to comment.