Quickly and easily label your data using codebooks saved as a YAML text file.
remotes::install_github("nt-williams/codebreak")
or
# Install 'codebook' from 'nt-williams' universe
install.packages('codebreak', repos = 'https://nt-williams.r-universe.dev')
some_data <- data.frame(
x = c(1, 2, 5, 3, 4, 1),
y = c(0, 1, 1, 0, 1, 9),
z = c(5.2, 3.1, 5.6, 8.9, 9.0, 7.2),
w = c(1, 1, 0, 1, 1, 1)
)
Codebooks are created as YAML text files and are saved in the project
directory (or somewhere else) as codebook.yml
(or as something else).
x:
label: Variable X # include meaningful variable descriptions
cb:
1: These # convert variable codes to labels
2: Are
3: Random
4: Character
5: Labels
"y":
label: Variable Y
cb: &binary # reduce repetition with anchors
0: "No"
1: "Yes"
9: null # account for coded missing values
z:
label: Variable Z
w:
label: Variable W
cb: *binary
Import and apply the codebook to the data:
cb <- codebreak::Codebook$new(system.file("codebook.yml", package = "codebreak"))
cb
#> codebook: /Users/nicholaswilliams/Library/R/arm64/4.4/library/codebreak/codebook.yml
#>
#> decode data with `<obj>$decode()`
#> label data with `<obj>$label()`
cb$decode(some_data)
#> x y z w
#> 1 These No 5.2 Yes
#> 2 Are Yes 3.1 Yes
#> 3 Labels Yes 5.6 No
#> 4 Random No 8.9 Yes
#> 5 Character Yes 9.0 Yes
#> 6 These <NA> 7.2 Yes
Rename columns based on the codebook labels:
cb$label(some_data)
#> Variable X Variable Y Variable Z Variable W
#> 1 1 0 5.2 1
#> 2 2 1 3.1 1
#> 3 5 1 5.6 0
#> 4 3 0 8.9 1
#> 5 4 1 9.0 1
#> 6 1 9 7.2 1
Apply the codebook and rename columns:
cb$decode(some_data, label = TRUE)
#> Variable X Variable Y Variable Z Variable W
#> 1 These No 5.2 Yes
#> 2 Are Yes 3.1 Yes
#> 3 Labels Yes 5.6 No
#> 4 Random No 8.9 Yes
#> 5 Character Yes 9.0 Yes
#> 6 These <NA> 7.2 Yes
decode()
and label()
can return data with the codebook applied using
the labelled
package by
setting as_labelled = TRUE
.
some_data <- tibble::as_tibble(some_data)
cb$decode(some_data, as_labelled = TRUE)
#> # A tibble: 6 × 4
#> x y z w
#> <dbl+lbl> <dbl+lbl> <dbl> <dbl+lbl>
#> 1 1 [These] 0 [No] 5.2 1 [Yes]
#> 2 2 [Are] 1 [Yes] 3.1 1 [Yes]
#> 3 5 [Labels] 1 [Yes] 5.6 0 [No]
#> 4 3 [Random] 0 [No] 8.9 1 [Yes]
#> 5 4 [Character] 1 [Yes] 9 1 [Yes]
#> 6 1 [These] NA 7.2 1 [Yes]