-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dplyr::case_when()
within dplyr::mutate()
loses qualtRics
variable labels
#323
Comments
So, what I "know" I've mostly cobbled together from working with this package over the past couple years + my own use of it for data cleaning. My interaction with it doesn't overlap with the original owner. But basically, I think the situation was/is:
Now, even if I'm right about the above, I kind of think that a lot of that might be viewed differently today. IMO the current, post- Meanwhile, if you want to preserve things in your own workflows you're going to need some options. The obvious option is to convert to the "labelled" class, though there are other approaches: require(tidyverse, quietly = TRUE)
require(haven, quietly = TRUE)
require(sjlabelled, quietly = TRUE)
# Function for converting to the labelled class:
make_labelled <-
\(x){
haven::labelled(x = x,
label = attr(x, "label"),
labels = attr(x, "labels")
)
}
# Example data frame:
test <-
tibble(
a = sample(c(1,2,50), 15, replace = TRUE) |>
structure(label = "a label"),
)
test |> get_label()
#> a
#> "a label"
test <-
test |>
mutate(
# This approach loses label/labels attributes:
a_conv =
a |>
case_match(50 ~ 3, .default = a),
# But you can convert first:
a_lab =
a |>
make_labelled(),
# then the standard dplyr tools will preserve attributes (if used properly):
a_lab_conv =
a_lab |>
case_match(50 ~ 3, .default = a_lab),
# If you want to preserve attributes but don't want to end up with
# labelled vars, you can do it in place (this requires magrittr's %>%):
a_conv2 =
a %>%
make_labelled() %>%
case_match(.x = . ,50 ~ 3, .default = .) |>
sjlabelled::unlabel(),
# or, an even simpler manual approach:
a_conv3 =
a |>
case_match(50 ~ 3, .default = a) |>
structure(label = attr(a, "label"))
)
# Some give you a "labelled" class, some don't:
test |> purrr::map(class)
#> $a
#> [1] "numeric"
#>
#> $a_conv
#> [1] "numeric"
#>
#> $a_lab
#> [1] "haven_labelled" "vctrs_vctr" "double"
#>
#> $a_lab_conv
#> [1] "haven_labelled" "vctrs_vctr" "double"
#>
#> $a_conv2
#> [1] "numeric"
#>
#> $a_conv3
#> [1] "numeric"
# But they all (other than the one) preserve the "label" attribute
test |> sjlabelled::get_label()
#> a a_conv a_lab a_lab_conv a_conv2 a_conv3
#> "a label" "" "a label" "a label" "a label" "a label" Created on 2023-08-08 with reprex v2.0.2 |
This is spot-on IMO; the decisions around sjabelled were made quite a long time ago, before some newer and better options existed. I do think these attributes are worth revisiting so folks have data that works better with current tools. I would be open to avoiding these kinds of attributes altogether in lieu of nicer tools for dealing with the labels and other metadata, but if that would be too much of a change, we can think through how this should be updated, maybe using haven's infrastructure instead of sjlabelled. |
What would you say are the current newer & better options? On its face I don't love the label/labels attribute approach either, but I'm not up-to-date on what alternatives might be emerging as best practice. I will say that one case for sticking with the attribute-centric approach is |
Ah sorry, I may not have been clear.
|
Thanks for the workaround @jmobrien. Just for the sake of completeness, here is the workaround I was using (basically saving labels and manually adding them back after to avoid relying on another package): suppressWarnings(suppressPackageStartupMessages(library(qualtRics)))
suppressWarnings(suppressPackageStartupMessages(library(sjlabelled)))
suppressWarnings(suppressPackageStartupMessages(library(dplyr)))
# Extract all surveys
surveys <- all_surveys()
# # Identify right survey
survey1.id <- surveys$id[
which("Projet priming-aggression (Part 1)_Study 3" == surveys$name)]
# # Fetch right survey
data <- suppressMessages(fetch_survey(surveyID = survey1.id, verbose = FALSE))
# sjlabelled works
get_label(data$Status)
#> Status
#> "Response Type"
# Save question labels
labels.data <- data |>
get_label() |>
bind_rows()
# case_when
data <- data %>%
mutate(Status = case_when(Status == 50 ~ 1),
Progress = case_when(Progress == 100 ~ 1))
# Labels lost
get_label(data$Status)
#> NULL
# Repair labels
data <- data %>%
mutate(Status = set_label(Status, labels.data$Status))
# Labels recovered
get_label(data$Status)
#> [1] "Response Type"
# Problem: needs to be done for each variable
get_label(data$Progress)
#> NULL Created on 2023-08-14 with reprex v2.0.2 There would probably be a way to automate this process more efficiently through a function for all relevant variables though... |
Great, that works too. For automation across multiple variables, in one of my cases I ended up creating a |
Expanding on your response @juliasilge, yes, this runs up against where we're already using a dual-approach model wherein question text metadata can be embedded at the variable level via labels, at the dataframe level via the attached column map (attribute), or both. We could definitely move more specifically in either direction if we saw fit. Also, I suppose an alternative approach would be to write some helper functions that can add/restore labels from the column map as needed. |
Hi, continuing the discussion with @jmobrien over from tidyverse/dplyr#6857 :)
Summary of issue:
dplyr::case_when()
withindplyr::mutate()
losesqualtRics
variable labels because they are created throughsjlabelled
. DavisVaughan from thedplyr
team suggests usinghaven
instead ofsjlabelled
. jmobrien specifies that there may already exist workflows compatible withdplyr::case_when()
or an equivalent. Reprexes available in original issue.Are there already open or closed issues about this or online documentation I could refer to regarding those alternative workflows?
The text was updated successfully, but these errors were encountered: