Skip to content

Commit

Permalink
Create Broom_R_Package.Rmd
Browse files Browse the repository at this point in the history
  • Loading branch information
DragonflyStats authored Nov 28, 2023
1 parent 5f0e146 commit 3ee1d7e
Showing 1 changed file with 341 additions and 0 deletions.
341 changes: 341 additions & 0 deletions preparation/Broom_R_Package.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,341 @@
---
title: "Correlation"
subtitle: "Statistics With R"
author: "R Workshop"
output:
prettydoc::html_pretty:
theme: cayman
highlight: github
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
require(prettydoc)
require(tidyverse)
require(janitor)
```

Broom
============

#### Convert Statistical Analysis Objects into Tidy Data Frames

Convert statistical analysis objects from R into tidy data frames, so that they can more easily be combined, reshaped and otherwise processed with tools like 'dplyr', 'tidyr' and 'ggplot2'.

The package provides three S3 generics:
* **``tidy``**: summarizes a model's statistical findings such as coefficients of a regression;
* **``augment``**: adds columns to the original data such as predictions, residuals and cluster assignments
* **``glance``**: which provides a one-row summary of model-level statistics.



```R
# Run once, then comment out
install.packages("broom")

library(magrittr)

library(broom)
```

Installing package into '/home/nbcommon/R'
(as 'lib' is unspecified)


### Fit a Linear Model
* Just use a simple example using inbuilt data sets
* Save it as `myModel`


```R
lm(mpg ~ wt + cyl, data=mtcars)
```



Call:
lm(formula = mpg ~ wt + cyl, data = mtcars)

Coefficients:
(Intercept) wt cyl
39.686 -3.191 -1.508




```R
myModel <- lm(mpg ~ wt + cyl, data=mtcars)
summary(myModel)
```



Call:
lm(formula = mpg ~ wt + cyl, data = mtcars)

Residuals:
Min 1Q Median 3Q Max
-4.2893 -1.5512 -0.4684 1.5743 6.1004

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.6863 1.7150 23.141 < 2e-16 ***
wt -3.1910 0.7569 -4.216 0.000222 ***
cyl -1.5078 0.4147 -3.636 0.001064 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.568 on 29 degrees of freedom
Multiple R-squared: 0.8302, Adjusted R-squared: 0.8185
F-statistic: 70.91 on 2 and 29 DF, p-value: 6.809e-12



#### 1. The `tidy()` function

**``tidy()``** constructs a data frame that summarizes the model's statistical findings. This includes coefficients and p-values for each term in a regression, per-cluster information in clustering applications, or per-test information for multtest functions.



```R
tidy(myModel)
```


<table>
<thead><tr><th>term</th><th>estimate</th><th>std.error</th><th>statistic</th><th>p.value</th></tr></thead>
<tbody>
<tr><td>(Intercept) </td><td>39.686261 </td><td>1.7149840 </td><td>23.140893 </td><td>3.043182e-20</td></tr>
<tr><td>wt </td><td>-3.190972 </td><td>0.7569065 </td><td>-4.215808 </td><td>2.220200e-04</td></tr>
<tr><td>cyl </td><td>-1.507795 </td><td>0.4146883 </td><td>-3.635972 </td><td>1.064282e-03</td></tr>
</tbody>
</table>




```R
myTidyModel <- tidy(myModel)
```


```R
class(myTidyModel)
```


'data.frame'



```R
names(myTidyModel)
```


<ol class="list-inline">
<li>'term'</li>
<li>'estimate'</li>
<li>'std.error'</li>
<li>'statistic'</li>
<li>'p.value'</li>
</ol>




```R
myTidyModel$p.value %>% round(4)
```


<ol class="list-inline">
<li>0</li>
<li>2e-04</li>
<li>0.0011</li>
</ol>



#### 2. The `augment()` function

**``augment()``** add columns to the original data that was modeled.
This includes predictions, residuals, and cluster assignments.



```R
myModel <- lm(mpg ~ wt+cyl, mtcars)
augment(myModel)

```


<table>
<thead><tr><th>.rownames</th><th>mpg</th><th>wt</th><th>cyl</th><th>.fitted</th><th>.se.fit</th><th>.resid</th><th>.hat</th><th>.sigma</th><th>.cooksd</th><th>.std.resid</th></tr></thead>
<tbody>
<tr><td>Mazda RX4 </td><td>21.0 </td><td>2.620 </td><td>6 </td><td>22.27914 </td><td>0.6011667 </td><td>-1.27914467 </td><td>0.05482311 </td><td>2.601105 </td><td>0.0050772590 </td><td>-0.51244825 </td></tr>
<tr><td>Mazda RX4 Wag </td><td>21.0 </td><td>2.875 </td><td>6 </td><td>21.46545 </td><td>0.4976294 </td><td>-0.46544677 </td><td>0.03756521 </td><td>2.611423 </td><td>0.0004442585 </td><td>-0.18478693 </td></tr>
<tr><td>Datsun 710 </td><td>22.8 </td><td>2.320 </td><td>4 </td><td>26.25203 </td><td>0.7252444 </td><td>-3.45202624 </td><td>0.07978891 </td><td>2.522911 </td><td>0.0567764620 </td><td>-1.40157794 </td></tr>
<tr><td>Hornet 4 Drive </td><td>21.4 </td><td>3.215 </td><td>6 </td><td>20.38052 </td><td>0.4602669 </td><td> 1.01948376 </td><td>0.03213611 </td><td>2.605613 </td><td>0.0018029260 </td><td> 0.40360828 </td></tr>
<tr><td>Hornet Sportabout </td><td>18.7 </td><td>3.440 </td><td>8 </td><td>16.64696 </td><td>0.7752706 </td><td> 2.05304242 </td><td>0.09117599 </td><td>2.581072 </td><td>0.0235271472 </td><td> 0.83877393 </td></tr>
<tr><td>Valiant </td><td>18.1 </td><td>3.460 </td><td>6 </td><td>19.59873 </td><td>0.5178496 </td><td>-1.49872807 </td><td>0.04068001 </td><td>2.596911 </td><td>0.0050205614 </td><td>-0.59597493 </td></tr>
<tr><td>Duster 360 </td><td>14.3 </td><td>3.570 </td><td>8 </td><td>16.23213 </td><td>0.7267482 </td><td>-1.93213120 </td><td>0.08012014 </td><td>2.585079 </td><td>0.0178733213 </td><td>-0.78461743 </td></tr>
<tr><td>Merc 240D </td><td>24.4 </td><td>3.190 </td><td>4 </td><td>23.47588 </td><td>1.0000172 </td><td> 0.92411952 </td><td>0.15170109 </td><td>2.606073 </td><td>0.0091033181 </td><td> 0.39078743 </td></tr>
<tr><td>Merc 230 </td><td>22.8 </td><td>3.150 </td><td>4 </td><td>23.60352 </td><td>0.9793969 </td><td>-0.80351937 </td><td>0.14550945 </td><td>2.607793 </td><td>0.0065061176 </td><td>-0.33855530 </td></tr>
<tr><td>Merc 280 </td><td>19.2 </td><td>3.440 </td><td>6 </td><td>19.66255 </td><td>0.5108741 </td><td>-0.46254751 </td><td>0.03959146 </td><td>2.611439 </td><td>0.0004643600 </td><td>-0.18382951 </td></tr>
<tr><td>Merc 280C </td><td>17.8 </td><td>3.440 </td><td>6 </td><td>19.66255 </td><td>0.5108741 </td><td>-1.86254751 </td><td>0.03959146 </td><td>2.588159 </td><td>0.0075293380 </td><td>-0.74022926 </td></tr>
<tr><td>Merc 450SE </td><td>16.4 </td><td>4.070 </td><td>8 </td><td>14.63665 </td><td>0.6544576 </td><td> 1.76335487 </td><td>0.06497359 </td><td>2.590136 </td><td>0.0116847953 </td><td> 0.71025562 </td></tr>
<tr><td>Merc 450SL </td><td>17.3 </td><td>3.730 </td><td>8 </td><td>15.72158 </td><td>0.6819424 </td><td> 1.57842434 </td><td>0.07054547 </td><td>2.594578 </td><td>0.0102875723 </td><td> 0.63767089 </td></tr>
<tr><td>Merc 450SLC </td><td>15.2 </td><td>3.780 </td><td>8 </td><td>15.56203 </td><td>0.6718159 </td><td>-0.36202705 </td><td>0.06846591 </td><td>2.612000 </td><td>0.0005228914 </td><td>-0.14609271 </td></tr>
<tr><td>Cadillac Fleetwood </td><td>10.4 </td><td>5.250 </td><td>8 </td><td>10.87130 </td><td>1.1525645 </td><td>-0.47129800 </td><td>0.20151356 </td><td>2.611060 </td><td>0.0035498738 </td><td>-0.20542284 </td></tr>
<tr><td>Lincoln Continental</td><td>10.4 </td><td>5.424 </td><td>8 </td><td>10.31607 </td><td>1.2633704 </td><td> 0.08393115 </td><td>0.24212252 </td><td>2.612898 </td><td>0.0001501537 </td><td> 0.03755005 </td></tr>
<tr><td>Chrysler Imperial </td><td>14.7 </td><td>5.345 </td><td>8 </td><td>10.56816 </td><td>1.2125441 </td><td> 4.13184435 </td><td>0.22303287 </td><td>2.458216 </td><td>0.3189363624 </td><td> 1.82570047 </td></tr>
<tr><td>Fiat 128 </td><td>32.4 </td><td>2.200 </td><td>4 </td><td>26.63494 </td><td>0.7270859 </td><td> 5.76505710 </td><td>0.08019462 </td><td>2.353101 </td><td>0.1592990291 </td><td> 2.34122168 </td></tr>
<tr><td>Honda Civic </td><td>30.4 </td><td>1.615 </td><td>4 </td><td>28.50166 </td><td>0.8820281 </td><td> 1.89833840 </td><td>0.11801538 </td><td>2.584888 </td><td>0.0276449872 </td><td> 0.78728146 </td></tr>
<tr><td>Toyota Corolla </td><td>33.9 </td><td>1.835 </td><td>4 </td><td>27.79965 </td><td>0.7988791 </td><td> 6.10035227 </td><td>0.09681350 </td><td>2.314308 </td><td>0.2233281268 </td><td> 2.50007531 </td></tr>
<tr><td>Toyota Corona </td><td>21.5 </td><td>2.465 </td><td>4 </td><td>25.78934 </td><td>0.7380797 </td><td>-4.28933528 </td><td>0.08263810 </td><td>2.472103 </td><td>0.0913548207 </td><td>-1.74424120 </td></tr>
<tr><td>Dodge Challenger </td><td>15.5 </td><td>3.520 </td><td>8 </td><td>16.39168 </td><td>0.7442464 </td><td>-0.89167980 </td><td>0.08402476 </td><td>2.607023 </td><td>0.0040263378 </td><td>-0.36287242 </td></tr>
<tr><td>AMC Javelin </td><td>15.2 </td><td>3.435 </td><td>8 </td><td>16.66291 </td><td>0.7773252 </td><td>-1.46291244 </td><td>0.09165987 </td><td>2.596811 </td><td>0.0120218543 </td><td>-0.59783451 </td></tr>
<tr><td>Camaro Z28 </td><td>13.3 </td><td>3.840 </td><td>8 </td><td>15.37057 </td><td>0.6623197 </td><td>-2.07056872 </td><td>0.06654404 </td><td>2.581383 </td><td>0.0165559199 </td><td>-0.83469849 </td></tr>
<tr><td>Pontiac Firebird </td><td>19.2 </td><td>3.845 </td><td>8 </td><td>15.35461 </td><td>0.6616629 </td><td> 3.84538614 </td><td>0.06641213 </td><td>2.502378 </td><td>0.0569730451 </td><td> 1.55006265 </td></tr>
<tr><td>Fiat X1-9 </td><td>27.3 </td><td>1.935 </td><td>4 </td><td>27.48055 </td><td>0.7700721 </td><td>-0.18055052 </td><td>0.08995733 </td><td>2.612717 </td><td>0.0001790454 </td><td>-0.07371481 </td></tr>
<tr><td>Porsche 914-2 </td><td>26.0 </td><td>2.140 </td><td>4 </td><td>26.82640 </td><td>0.7322422 </td><td>-0.82640123 </td><td>0.08133608 </td><td>2.607877 </td><td>0.0033281614 </td><td>-0.33581456 </td></tr>
<tr><td>Lotus Europa </td><td>30.4 </td><td>1.513 </td><td>4 </td><td>28.82714 </td><td>0.9282190 </td><td> 1.57285924 </td><td>0.13069974 </td><td>2.593440 </td><td>0.0216355209 </td><td> 0.65704006 </td></tr>
<tr><td>Ford Pantera L </td><td>15.8 </td><td>3.170 </td><td>8 </td><td>17.50852 </td><td>0.9023791 </td><td>-1.70852005 </td><td>0.12352416 </td><td>2.590102 </td><td>0.0237336584 </td><td>-0.71078295 </td></tr>
<tr><td>Ferrari Dino </td><td>19.7 </td><td>2.770 </td><td>6 </td><td>21.80050 </td><td>0.5342815 </td><td>-2.10049885 </td><td>0.04330261 </td><td>2.581252 </td><td>0.0105550987 </td><td>-0.83641545 </td></tr>
<tr><td>Maserati Bora </td><td>15.0 </td><td>3.570 </td><td>8 </td><td>16.23213 </td><td>0.7267482 </td><td>-1.23213120 </td><td>0.08012014 </td><td>2.601659 </td><td>0.0072685192 </td><td>-0.50035506 </td></tr>
<tr><td>Volvo 142E </td><td>21.4 </td><td>2.780 </td><td>4 </td><td>24.78418 </td><td>0.8176667 </td><td>-3.38417906 </td><td>0.10142065 </td><td>2.524357 </td><td>0.0727399065 </td><td>-1.39047125 </td></tr>
</tbody>
</table>



#### 3. The ``glance()`` function

**``glance()``** construct a concise one-row summary of the model. This typically contains values such as R2, adjusted R2, and residual standard error that are computed once for the entire model.



```R
glance(myModel)


```


<table>
<thead><tr><th>p.value</th><th>df</th><th>logLik</th></tr></thead>
<tbody>
<tr><td>6.808955e-12</td><td>3 </td><td>-74.00503 </td></tr>
</tbody>
</table>



## K-Means


```R
kmeans(iris[,1:4],3 )
```


K-means clustering with 3 clusters of sizes 96, 33, 21

Cluster means:
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 6.314583 2.895833 4.973958 1.7031250
2 5.175758 3.624242 1.472727 0.2727273
3 4.738095 2.904762 1.790476 0.3523810

Clustering vector:
[1] 2 3 3 3 2 2 2 2 3 3 2 2 3 3 2 2 2 2 2 2 2 2 2 2 3 3 2 2 2 3 3 2 2 2 3 2 2
[38] 2 3 2 2 3 3 2 2 3 2 3 2 2 1 1 1 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1
[75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1
[112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[149] 1 1

Within cluster sum of squares by cluster:
[1] 118.651875 6.432121 17.669524
(between_SS / total_SS = 79.0 %)

Available components:

[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"



```R
KMmodel <- kmeans(iris[,1:4],3 )
summary(KMmodel)
```


Length Class Mode
cluster 150 -none- numeric
centers 12 -none- numeric
totss 1 -none- numeric
withinss 3 -none- numeric
tot.withinss 1 -none- numeric
betweenss 1 -none- numeric
size 3 -none- numeric
iter 1 -none- numeric
ifault 1 -none- numeric



```R
tidy(KMmodel)
```


<table>
<thead><tr><th>x1</th><th>x2</th><th>x3</th><th>x4</th><th>size</th><th>withinss</th><th>cluster</th></tr></thead>
<tbody>
<tr><td>5.006000</td><td>3.428000</td><td>1.462000</td><td>0.246000</td><td>50 </td><td>15.15100</td><td>1 </td></tr>
<tr><td>5.901613</td><td>2.748387</td><td>4.393548</td><td>1.433871</td><td>62 </td><td>39.82097</td><td>2 </td></tr>
<tr><td>6.850000</td><td>3.073684</td><td>5.742105</td><td>2.071053</td><td>38 </td><td>23.87947</td><td>3 </td></tr>
</tbody>
</table>




```R
# augment(Model,Data)
```


```R
augment(KMmodel,iris[,1:4]) %>%head()
```


<table>
<thead><tr><th>Sepal.Length</th><th>Sepal.Width</th><th>Petal.Length</th><th>Petal.Width</th><th>.cluster</th></tr></thead>
<tbody>
<tr><td>5.1</td><td>3.5</td><td>1.4</td><td>0.2</td><td>1 </td></tr>
<tr><td>4.9</td><td>3.0</td><td>1.4</td><td>0.2</td><td>1 </td></tr>
<tr><td>4.7</td><td>3.2</td><td>1.3</td><td>0.2</td><td>1 </td></tr>
<tr><td>4.6</td><td>3.1</td><td>1.5</td><td>0.2</td><td>1 </td></tr>
<tr><td>5.0</td><td>3.6</td><td>1.4</td><td>0.2</td><td>1 </td></tr>
<tr><td>5.4</td><td>3.9</td><td>1.7</td><td>0.4</td><td>1 </td></tr>
</tbody>
</table>




```R
glance(KMmodel)
```


<table>
<thead><tr><th>totss</th><th>tot.withinss</th><th>betweenss</th><th>iter</th></tr></thead>
<tbody>
<tr><td>681.3706</td><td>78.85144</td><td>602.5192</td><td>2 </td></tr>
</tbody>
</table>

0 comments on commit 3ee1d7e

Please sign in to comment.