-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
5f0e146
commit 3ee1d7e
Showing
1 changed file
with
341 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,341 @@ | ||
--- | ||
title: "Correlation" | ||
subtitle: "Statistics With R" | ||
author: "R Workshop" | ||
output: | ||
prettydoc::html_pretty: | ||
theme: cayman | ||
highlight: github | ||
--- | ||
|
||
```{r setup, include=FALSE} | ||
knitr::opts_chunk$set(echo = TRUE) | ||
require(prettydoc) | ||
require(tidyverse) | ||
require(janitor) | ||
``` | ||
|
||
Broom | ||
============ | ||
|
||
#### Convert Statistical Analysis Objects into Tidy Data Frames | ||
|
||
Convert statistical analysis objects from R into tidy data frames, so that they can more easily be combined, reshaped and otherwise processed with tools like 'dplyr', 'tidyr' and 'ggplot2'. | ||
|
||
The package provides three S3 generics: | ||
* **``tidy``**: summarizes a model's statistical findings such as coefficients of a regression; | ||
* **``augment``**: adds columns to the original data such as predictions, residuals and cluster assignments | ||
* **``glance``**: which provides a one-row summary of model-level statistics. | ||
|
||
|
||
|
||
```R | ||
# Run once, then comment out | ||
install.packages("broom") | ||
|
||
library(magrittr) | ||
|
||
library(broom) | ||
``` | ||
|
||
Installing package into '/home/nbcommon/R' | ||
(as 'lib' is unspecified) | ||
|
||
|
||
### Fit a Linear Model | ||
* Just use a simple example using inbuilt data sets | ||
* Save it as `myModel` | ||
|
||
|
||
```R | ||
lm(mpg ~ wt + cyl, data=mtcars) | ||
``` | ||
|
||
|
||
|
||
Call: | ||
lm(formula = mpg ~ wt + cyl, data = mtcars) | ||
|
||
Coefficients: | ||
(Intercept) wt cyl | ||
39.686 -3.191 -1.508 | ||
|
||
|
||
|
||
|
||
```R | ||
myModel <- lm(mpg ~ wt + cyl, data=mtcars) | ||
summary(myModel) | ||
``` | ||
|
||
|
||
|
||
Call: | ||
lm(formula = mpg ~ wt + cyl, data = mtcars) | ||
|
||
Residuals: | ||
Min 1Q Median 3Q Max | ||
-4.2893 -1.5512 -0.4684 1.5743 6.1004 | ||
|
||
Coefficients: | ||
Estimate Std. Error t value Pr(>|t|) | ||
(Intercept) 39.6863 1.7150 23.141 < 2e-16 *** | ||
wt -3.1910 0.7569 -4.216 0.000222 *** | ||
cyl -1.5078 0.4147 -3.636 0.001064 ** | ||
--- | ||
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 | ||
|
||
Residual standard error: 2.568 on 29 degrees of freedom | ||
Multiple R-squared: 0.8302, Adjusted R-squared: 0.8185 | ||
F-statistic: 70.91 on 2 and 29 DF, p-value: 6.809e-12 | ||
|
||
|
||
|
||
#### 1. The `tidy()` function | ||
|
||
**``tidy()``** constructs a data frame that summarizes the model's statistical findings. This includes coefficients and p-values for each term in a regression, per-cluster information in clustering applications, or per-test information for multtest functions. | ||
|
||
|
||
|
||
```R | ||
tidy(myModel) | ||
``` | ||
|
||
|
||
<table> | ||
<thead><tr><th>term</th><th>estimate</th><th>std.error</th><th>statistic</th><th>p.value</th></tr></thead> | ||
<tbody> | ||
<tr><td>(Intercept) </td><td>39.686261 </td><td>1.7149840 </td><td>23.140893 </td><td>3.043182e-20</td></tr> | ||
<tr><td>wt </td><td>-3.190972 </td><td>0.7569065 </td><td>-4.215808 </td><td>2.220200e-04</td></tr> | ||
<tr><td>cyl </td><td>-1.507795 </td><td>0.4146883 </td><td>-3.635972 </td><td>1.064282e-03</td></tr> | ||
</tbody> | ||
</table> | ||
|
||
|
||
|
||
|
||
```R | ||
myTidyModel <- tidy(myModel) | ||
``` | ||
|
||
|
||
```R | ||
class(myTidyModel) | ||
``` | ||
|
||
|
||
'data.frame' | ||
|
||
|
||
|
||
```R | ||
names(myTidyModel) | ||
``` | ||
|
||
|
||
<ol class="list-inline"> | ||
<li>'term'</li> | ||
<li>'estimate'</li> | ||
<li>'std.error'</li> | ||
<li>'statistic'</li> | ||
<li>'p.value'</li> | ||
</ol> | ||
|
||
|
||
|
||
|
||
```R | ||
myTidyModel$p.value %>% round(4) | ||
``` | ||
|
||
|
||
<ol class="list-inline"> | ||
<li>0</li> | ||
<li>2e-04</li> | ||
<li>0.0011</li> | ||
</ol> | ||
|
||
|
||
|
||
#### 2. The `augment()` function | ||
|
||
**``augment()``** add columns to the original data that was modeled. | ||
This includes predictions, residuals, and cluster assignments. | ||
|
||
|
||
|
||
```R | ||
myModel <- lm(mpg ~ wt+cyl, mtcars) | ||
augment(myModel) | ||
|
||
``` | ||
|
||
|
||
<table> | ||
<thead><tr><th>.rownames</th><th>mpg</th><th>wt</th><th>cyl</th><th>.fitted</th><th>.se.fit</th><th>.resid</th><th>.hat</th><th>.sigma</th><th>.cooksd</th><th>.std.resid</th></tr></thead> | ||
<tbody> | ||
<tr><td>Mazda RX4 </td><td>21.0 </td><td>2.620 </td><td>6 </td><td>22.27914 </td><td>0.6011667 </td><td>-1.27914467 </td><td>0.05482311 </td><td>2.601105 </td><td>0.0050772590 </td><td>-0.51244825 </td></tr> | ||
<tr><td>Mazda RX4 Wag </td><td>21.0 </td><td>2.875 </td><td>6 </td><td>21.46545 </td><td>0.4976294 </td><td>-0.46544677 </td><td>0.03756521 </td><td>2.611423 </td><td>0.0004442585 </td><td>-0.18478693 </td></tr> | ||
<tr><td>Datsun 710 </td><td>22.8 </td><td>2.320 </td><td>4 </td><td>26.25203 </td><td>0.7252444 </td><td>-3.45202624 </td><td>0.07978891 </td><td>2.522911 </td><td>0.0567764620 </td><td>-1.40157794 </td></tr> | ||
<tr><td>Hornet 4 Drive </td><td>21.4 </td><td>3.215 </td><td>6 </td><td>20.38052 </td><td>0.4602669 </td><td> 1.01948376 </td><td>0.03213611 </td><td>2.605613 </td><td>0.0018029260 </td><td> 0.40360828 </td></tr> | ||
<tr><td>Hornet Sportabout </td><td>18.7 </td><td>3.440 </td><td>8 </td><td>16.64696 </td><td>0.7752706 </td><td> 2.05304242 </td><td>0.09117599 </td><td>2.581072 </td><td>0.0235271472 </td><td> 0.83877393 </td></tr> | ||
<tr><td>Valiant </td><td>18.1 </td><td>3.460 </td><td>6 </td><td>19.59873 </td><td>0.5178496 </td><td>-1.49872807 </td><td>0.04068001 </td><td>2.596911 </td><td>0.0050205614 </td><td>-0.59597493 </td></tr> | ||
<tr><td>Duster 360 </td><td>14.3 </td><td>3.570 </td><td>8 </td><td>16.23213 </td><td>0.7267482 </td><td>-1.93213120 </td><td>0.08012014 </td><td>2.585079 </td><td>0.0178733213 </td><td>-0.78461743 </td></tr> | ||
<tr><td>Merc 240D </td><td>24.4 </td><td>3.190 </td><td>4 </td><td>23.47588 </td><td>1.0000172 </td><td> 0.92411952 </td><td>0.15170109 </td><td>2.606073 </td><td>0.0091033181 </td><td> 0.39078743 </td></tr> | ||
<tr><td>Merc 230 </td><td>22.8 </td><td>3.150 </td><td>4 </td><td>23.60352 </td><td>0.9793969 </td><td>-0.80351937 </td><td>0.14550945 </td><td>2.607793 </td><td>0.0065061176 </td><td>-0.33855530 </td></tr> | ||
<tr><td>Merc 280 </td><td>19.2 </td><td>3.440 </td><td>6 </td><td>19.66255 </td><td>0.5108741 </td><td>-0.46254751 </td><td>0.03959146 </td><td>2.611439 </td><td>0.0004643600 </td><td>-0.18382951 </td></tr> | ||
<tr><td>Merc 280C </td><td>17.8 </td><td>3.440 </td><td>6 </td><td>19.66255 </td><td>0.5108741 </td><td>-1.86254751 </td><td>0.03959146 </td><td>2.588159 </td><td>0.0075293380 </td><td>-0.74022926 </td></tr> | ||
<tr><td>Merc 450SE </td><td>16.4 </td><td>4.070 </td><td>8 </td><td>14.63665 </td><td>0.6544576 </td><td> 1.76335487 </td><td>0.06497359 </td><td>2.590136 </td><td>0.0116847953 </td><td> 0.71025562 </td></tr> | ||
<tr><td>Merc 450SL </td><td>17.3 </td><td>3.730 </td><td>8 </td><td>15.72158 </td><td>0.6819424 </td><td> 1.57842434 </td><td>0.07054547 </td><td>2.594578 </td><td>0.0102875723 </td><td> 0.63767089 </td></tr> | ||
<tr><td>Merc 450SLC </td><td>15.2 </td><td>3.780 </td><td>8 </td><td>15.56203 </td><td>0.6718159 </td><td>-0.36202705 </td><td>0.06846591 </td><td>2.612000 </td><td>0.0005228914 </td><td>-0.14609271 </td></tr> | ||
<tr><td>Cadillac Fleetwood </td><td>10.4 </td><td>5.250 </td><td>8 </td><td>10.87130 </td><td>1.1525645 </td><td>-0.47129800 </td><td>0.20151356 </td><td>2.611060 </td><td>0.0035498738 </td><td>-0.20542284 </td></tr> | ||
<tr><td>Lincoln Continental</td><td>10.4 </td><td>5.424 </td><td>8 </td><td>10.31607 </td><td>1.2633704 </td><td> 0.08393115 </td><td>0.24212252 </td><td>2.612898 </td><td>0.0001501537 </td><td> 0.03755005 </td></tr> | ||
<tr><td>Chrysler Imperial </td><td>14.7 </td><td>5.345 </td><td>8 </td><td>10.56816 </td><td>1.2125441 </td><td> 4.13184435 </td><td>0.22303287 </td><td>2.458216 </td><td>0.3189363624 </td><td> 1.82570047 </td></tr> | ||
<tr><td>Fiat 128 </td><td>32.4 </td><td>2.200 </td><td>4 </td><td>26.63494 </td><td>0.7270859 </td><td> 5.76505710 </td><td>0.08019462 </td><td>2.353101 </td><td>0.1592990291 </td><td> 2.34122168 </td></tr> | ||
<tr><td>Honda Civic </td><td>30.4 </td><td>1.615 </td><td>4 </td><td>28.50166 </td><td>0.8820281 </td><td> 1.89833840 </td><td>0.11801538 </td><td>2.584888 </td><td>0.0276449872 </td><td> 0.78728146 </td></tr> | ||
<tr><td>Toyota Corolla </td><td>33.9 </td><td>1.835 </td><td>4 </td><td>27.79965 </td><td>0.7988791 </td><td> 6.10035227 </td><td>0.09681350 </td><td>2.314308 </td><td>0.2233281268 </td><td> 2.50007531 </td></tr> | ||
<tr><td>Toyota Corona </td><td>21.5 </td><td>2.465 </td><td>4 </td><td>25.78934 </td><td>0.7380797 </td><td>-4.28933528 </td><td>0.08263810 </td><td>2.472103 </td><td>0.0913548207 </td><td>-1.74424120 </td></tr> | ||
<tr><td>Dodge Challenger </td><td>15.5 </td><td>3.520 </td><td>8 </td><td>16.39168 </td><td>0.7442464 </td><td>-0.89167980 </td><td>0.08402476 </td><td>2.607023 </td><td>0.0040263378 </td><td>-0.36287242 </td></tr> | ||
<tr><td>AMC Javelin </td><td>15.2 </td><td>3.435 </td><td>8 </td><td>16.66291 </td><td>0.7773252 </td><td>-1.46291244 </td><td>0.09165987 </td><td>2.596811 </td><td>0.0120218543 </td><td>-0.59783451 </td></tr> | ||
<tr><td>Camaro Z28 </td><td>13.3 </td><td>3.840 </td><td>8 </td><td>15.37057 </td><td>0.6623197 </td><td>-2.07056872 </td><td>0.06654404 </td><td>2.581383 </td><td>0.0165559199 </td><td>-0.83469849 </td></tr> | ||
<tr><td>Pontiac Firebird </td><td>19.2 </td><td>3.845 </td><td>8 </td><td>15.35461 </td><td>0.6616629 </td><td> 3.84538614 </td><td>0.06641213 </td><td>2.502378 </td><td>0.0569730451 </td><td> 1.55006265 </td></tr> | ||
<tr><td>Fiat X1-9 </td><td>27.3 </td><td>1.935 </td><td>4 </td><td>27.48055 </td><td>0.7700721 </td><td>-0.18055052 </td><td>0.08995733 </td><td>2.612717 </td><td>0.0001790454 </td><td>-0.07371481 </td></tr> | ||
<tr><td>Porsche 914-2 </td><td>26.0 </td><td>2.140 </td><td>4 </td><td>26.82640 </td><td>0.7322422 </td><td>-0.82640123 </td><td>0.08133608 </td><td>2.607877 </td><td>0.0033281614 </td><td>-0.33581456 </td></tr> | ||
<tr><td>Lotus Europa </td><td>30.4 </td><td>1.513 </td><td>4 </td><td>28.82714 </td><td>0.9282190 </td><td> 1.57285924 </td><td>0.13069974 </td><td>2.593440 </td><td>0.0216355209 </td><td> 0.65704006 </td></tr> | ||
<tr><td>Ford Pantera L </td><td>15.8 </td><td>3.170 </td><td>8 </td><td>17.50852 </td><td>0.9023791 </td><td>-1.70852005 </td><td>0.12352416 </td><td>2.590102 </td><td>0.0237336584 </td><td>-0.71078295 </td></tr> | ||
<tr><td>Ferrari Dino </td><td>19.7 </td><td>2.770 </td><td>6 </td><td>21.80050 </td><td>0.5342815 </td><td>-2.10049885 </td><td>0.04330261 </td><td>2.581252 </td><td>0.0105550987 </td><td>-0.83641545 </td></tr> | ||
<tr><td>Maserati Bora </td><td>15.0 </td><td>3.570 </td><td>8 </td><td>16.23213 </td><td>0.7267482 </td><td>-1.23213120 </td><td>0.08012014 </td><td>2.601659 </td><td>0.0072685192 </td><td>-0.50035506 </td></tr> | ||
<tr><td>Volvo 142E </td><td>21.4 </td><td>2.780 </td><td>4 </td><td>24.78418 </td><td>0.8176667 </td><td>-3.38417906 </td><td>0.10142065 </td><td>2.524357 </td><td>0.0727399065 </td><td>-1.39047125 </td></tr> | ||
</tbody> | ||
</table> | ||
|
||
|
||
|
||
#### 3. The ``glance()`` function | ||
|
||
**``glance()``** construct a concise one-row summary of the model. This typically contains values such as R2, adjusted R2, and residual standard error that are computed once for the entire model. | ||
|
||
|
||
|
||
```R | ||
glance(myModel) | ||
|
||
|
||
``` | ||
|
||
|
||
<table> | ||
<thead><tr><th>p.value</th><th>df</th><th>logLik</th></tr></thead> | ||
<tbody> | ||
<tr><td>6.808955e-12</td><td>3 </td><td>-74.00503 </td></tr> | ||
</tbody> | ||
</table> | ||
|
||
|
||
|
||
## K-Means | ||
|
||
|
||
```R | ||
kmeans(iris[,1:4],3 ) | ||
``` | ||
|
||
|
||
K-means clustering with 3 clusters of sizes 96, 33, 21 | ||
|
||
Cluster means: | ||
Sepal.Length Sepal.Width Petal.Length Petal.Width | ||
1 6.314583 2.895833 4.973958 1.7031250 | ||
2 5.175758 3.624242 1.472727 0.2727273 | ||
3 4.738095 2.904762 1.790476 0.3523810 | ||
|
||
Clustering vector: | ||
[1] 2 3 3 3 2 2 2 2 3 3 2 2 3 3 2 2 2 2 2 2 2 2 2 2 3 3 2 2 2 3 3 2 2 2 3 2 2 | ||
[38] 2 3 2 2 3 3 2 2 3 2 3 2 2 1 1 1 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 | ||
[75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 | ||
[112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | ||
[149] 1 1 | ||
|
||
Within cluster sum of squares by cluster: | ||
[1] 118.651875 6.432121 17.669524 | ||
(between_SS / total_SS = 79.0 %) | ||
|
||
Available components: | ||
|
||
[1] "cluster" "centers" "totss" "withinss" "tot.withinss" | ||
[6] "betweenss" "size" "iter" "ifault" | ||
|
||
|
||
|
||
```R | ||
KMmodel <- kmeans(iris[,1:4],3 ) | ||
summary(KMmodel) | ||
``` | ||
|
||
|
||
Length Class Mode | ||
cluster 150 -none- numeric | ||
centers 12 -none- numeric | ||
totss 1 -none- numeric | ||
withinss 3 -none- numeric | ||
tot.withinss 1 -none- numeric | ||
betweenss 1 -none- numeric | ||
size 3 -none- numeric | ||
iter 1 -none- numeric | ||
ifault 1 -none- numeric | ||
|
||
|
||
|
||
```R | ||
tidy(KMmodel) | ||
``` | ||
|
||
|
||
<table> | ||
<thead><tr><th>x1</th><th>x2</th><th>x3</th><th>x4</th><th>size</th><th>withinss</th><th>cluster</th></tr></thead> | ||
<tbody> | ||
<tr><td>5.006000</td><td>3.428000</td><td>1.462000</td><td>0.246000</td><td>50 </td><td>15.15100</td><td>1 </td></tr> | ||
<tr><td>5.901613</td><td>2.748387</td><td>4.393548</td><td>1.433871</td><td>62 </td><td>39.82097</td><td>2 </td></tr> | ||
<tr><td>6.850000</td><td>3.073684</td><td>5.742105</td><td>2.071053</td><td>38 </td><td>23.87947</td><td>3 </td></tr> | ||
</tbody> | ||
</table> | ||
|
||
|
||
|
||
|
||
```R | ||
# augment(Model,Data) | ||
``` | ||
|
||
|
||
```R | ||
augment(KMmodel,iris[,1:4]) %>%head() | ||
``` | ||
|
||
|
||
<table> | ||
<thead><tr><th>Sepal.Length</th><th>Sepal.Width</th><th>Petal.Length</th><th>Petal.Width</th><th>.cluster</th></tr></thead> | ||
<tbody> | ||
<tr><td>5.1</td><td>3.5</td><td>1.4</td><td>0.2</td><td>1 </td></tr> | ||
<tr><td>4.9</td><td>3.0</td><td>1.4</td><td>0.2</td><td>1 </td></tr> | ||
<tr><td>4.7</td><td>3.2</td><td>1.3</td><td>0.2</td><td>1 </td></tr> | ||
<tr><td>4.6</td><td>3.1</td><td>1.5</td><td>0.2</td><td>1 </td></tr> | ||
<tr><td>5.0</td><td>3.6</td><td>1.4</td><td>0.2</td><td>1 </td></tr> | ||
<tr><td>5.4</td><td>3.9</td><td>1.7</td><td>0.4</td><td>1 </td></tr> | ||
</tbody> | ||
</table> | ||
|
||
|
||
|
||
|
||
```R | ||
glance(KMmodel) | ||
``` | ||
|
||
|
||
<table> | ||
<thead><tr><th>totss</th><th>tot.withinss</th><th>betweenss</th><th>iter</th></tr></thead> | ||
<tbody> | ||
<tr><td>681.3706</td><td>78.85144</td><td>602.5192</td><td>2 </td></tr> | ||
</tbody> | ||
</table> |