forked from sfirke/janitor
-
Notifications
You must be signed in to change notification settings - Fork 0
/
planning.Rmd
53 lines (38 loc) · 1.8 KB
/
planning.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
title: "High-level package planning"
date: "`r Sys.Date()`"
output:
rmarkdown::github_document:
toc: true
toc_depth: 3
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
options(width = 110)
```
This page is for planning the janitor package, at a high level. More-finite questions and ideas can be handled via GitHub issues. This is for say, articulating what the package does or doesn't do, and how it should be organized. If it turns out we need a more discussion- and comment-friendly format, we can move to Google Docs, but let's try commenting and editing here.
## Package vision
### Purpose
Provide a framework and associated functions for checking and cleaning dirty data. There are two kinds of checks: interactive checks, like `tabyl`, and programmatic checks that say, confirm in production that some variables contain no duplicate records, or contain no missing values.
#### In scope:
#### Out of scope:
* A thorough set of alerting functions for assertive data cleaning in production (instead, recommend assertr or other package)
### Questions to resolve
1. Where should the organizing framework go - vignette? Bookdown? Main page?
### Priorities
#### Big items
1. Establish organizing framework for the package
1. Write up documentation/vignette showing check/clean iterative cycle
1. Create cannonical dirty data frame
2. Figure out new names for functions to fit schema, and rename them
3. Redo vignette
4. Redo homepage
2. New function: fuzzy dupes
3. New function family: bindability issues
#### Smaller items
1. `get_dupes` should have an option for returning with or without Ns as a column (it's so much faster without)