-
Notifications
You must be signed in to change notification settings - Fork 9
/
README.Rmd
124 lines (87 loc) · 3.43 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
output: github_document
editor_options:
chunk_output_type: console
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "90%",
fig.align = "center"
)
library(ggplot2)
theme_set(theme_minimal(base_size = 5) + theme(legend.position = "bottom"))
if(require(showtext)){
sysfonts::font_add_google("IBM Plex Sans", "plex")
showtext::showtext_auto()
}
```
# klassets
<!-- badges: start -->
[![R-CMD-check](https://github.com/jbkunst/klassets/workflows/R-CMD-check/badge.svg)](https://github.com/jbkunst/klassets/actions)
[![Github stars](https://img.shields.io/github/stars/jbkunst/klassets.svg?style=social&label=Github)](https://github.com/jbkunst/klassets)
[![R-CMD-check](https://github.com/jbkunst/klassets/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/jbkunst/klassets/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
The `{klassets}` package is a collection of functions to simulate data sets to:
- Teach how some Statistics Models and Machine Learning algorithms works.
- Illustrate certain some particular events such as heteroskedasticity or the Simpson's paradox.
- Compare the predictions between models, for example logistic regression vs decision tree vs $k$-Nearest Neighbours.
```{r, echo=FALSE}
knitr::include_graphics("man/figures/animation_quasi_anscombre.gif")
```
# Some examples
## Don't forget to visualize the data
```{r}
library(klassets)
set.seed(123)
df <- sim_quasianscombe_set_1(beta0 = 3, beta1 = 0.5)
plot(df) +
ggplot2::labs(subtitle = "Very similar to the given parameters (3 and 0.5)")
```
```{r}
library(patchwork)
df2 <- sim_quasianscombe_set_2(df, fun = sin)
df6 <- sim_quasianscombe_set_6(df, groups = 2, b1_factor = -1)
plot(df2) + plot(df6)
```
## Compare models in a classifications task
```{r}
df <- sim_response_xy(relationship = function(x, y) sin(x*pi) > sin(y*pi))
df
plot(df)
```
You can fit different models and see how the predictions are made.
```{r, out.width = "100%"}
plot(fit_logistic_regression(df, order = 4)) +
plot(fit_classification_tree(df)) +
plot(fit_classification_random_forest(df)) +
plot(fit_knn(df)) +
plot_layout(guides = "collect")
```
## How $K$-means works
Another example of what can be done with `{klassets}`.
```{r, echo=FALSE}
knitr::include_graphics("man/figures/animation_kmeans_iterations.gif")
```
## Where to start
You can check:
- `vignette("Quasi-Anscombe-data-sets")` to know more about `sim_quasianscombe_set*` functions family.
- `vignette("Binary-classification")`/`vignette("Regression")` to see classifiers/regression models/methods.
- `vignette("Clustering")` to see clustering functions.
- `vignette("MNIST")` to work with this data set to compare models and check
some variable importance metrics.
## Installation
You can install the development version of klassets from [GitHub](https://github.com/) with:
``` r
# install.packages("remotes")
remotes::install_github("jbkunst/klassets")
```
## Extra Info(?!)
**Why the name Klassets?** Just a weird merge for Class/Klass and sets.
Some inspiration and similar ideas:
- https://jumpingrivers.github.io/datasauRus/
- https://eliocamp.github.io/metamer/
- http://www.econometricsbysimulation.com/2019/03/the-importance-of-graphing-your-data.html This is almost the same, but the approach it's different.