forked from data-cleaning/dcmodifydb
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
126 lines (91 loc) · 2.79 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# dcmodifydb
<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/dcmodifydb)](https://CRAN.R-project.org/package=dcmodifydb)
[![R-CMD-check](https://github.com/data-cleaning/dcmodifydb/workflows/R-CMD-check/badge.svg)](https://github.com/data-cleaning/dcmodifydb/actions)
[![Downloads](https://cranlogs.r-pkg.org/badges/dcmodifydb)](https://cran.r-project.org/package=dcmodifydb)
[![Codecov test coverage](https://codecov.io/gh/data-cleaning/dcmodifydb/branch/main/graph/badge.svg)](https://codecov.io/gh/data-cleaning/dcmodifydb?branch=main)
<!-- badges: end -->
The goal of dcmodifydb is to apply modification rules specified with `dcmodify`
on a database table, allowing for documented, reproducable data cleaning adjustments
in a database.
`dcmodify` separates **intent** from **execution**: a user specifies _what_, _why_ and _how_ of an automatic data change and uses `dcmodifydb` to execute them on a `tbl` database table.
## Installation
<!-- You can install the released version of dcmodifydb from [CRAN](https://CRAN.R-project.org) with: -->
<!-- ``` r -->
<!-- install.packages("dcmodifydb") -->
<!-- ``` -->
The development version from [GitHub](https://github.com/) can be installed with:
``` r
# install.packages("devtools")
devtools::install_github("data-cleaning/dcmodifydb")
```
## Example
```{r, code = readLines("./example/modify.R")}
```
### Documented rules
```{r}
library(DBI)
library(dcmodify)
library(dcmodifydb)
con <- dbConnect(RSQLite::SQLite())
```
You can use YAML to store the modification rules: "example.yml"
```yaml
```{r, result="asis", child="example/example.yml"}
```
```
Let's load the rules and apply them to a data set:
```{r, eval = FALSE}
m <- modifier(.file = "example.yml")
```
```{r, echo = FALSE, eval = TRUE}
m <- modifier(.file = "example/example.yml")
```
```{r}
print(m)
```
```{r}
# setup the data
"age, income
11, 2000
150, 300
25, 2000
-10, 2000
" -> csv
income <- read.csv(text = csv, strip.white = TRUE)
dbWriteTable(con, "income", income)
tbl_income <- dplyr::tbl(con, "income")
# this is the table in the data base
tbl_income
# and now after modification
modify(tbl_income, m, copy = FALSE)
```
Generated sql can be written with `dump_sql`
```{r, eval=FALSE}
dump_sql(m, tbl_income, file = "modify.sql")
```
modify.sql:
```sql
```{r, echo=FALSE, results='asis'}
dump_sql(m, tbl_income)
```
```
```{r}
dbDisconnect(con)
```
Note: Modification rules can be written to yaml with `as_yaml` and `export_yaml`.
```{r, eval = FALSE}
dcmodify::export_yaml(m, "cleaning_steps.yml")
```