Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

case_when within mutate loses variable labels #6857

Closed
rempsyc opened this issue May 23, 2023 · 3 comments
Closed

case_when within mutate loses variable labels #6857

rempsyc opened this issue May 23, 2023 · 3 comments

Comments

@rempsyc
Copy link

rempsyc commented May 23, 2023

Maybe this is expected/not a bug, but it seems that case_when() within mutate() loses variable labels, although mutate() alone or with replace() doesn't. Reprex:

suppressWarnings(suppressPackageStartupMessages(library(dplyr)))
suppressPackageStartupMessages(library(sjlabelled))

data <- readRDS(url("https://osf.io/d4mjk/?action=download"))

get_label(data$Status)
#>          Status 
#> "Response Type"

data <- data %>% 
  mutate(Status = Status + 1)

get_label(data$Status)
#>          Status 
#> "Response Type"

data <- data %>% 
  mutate(Status = replace(Status, Status == 1, 50))

get_label(data$Status)
#>          Status 
#> "Response Type"

data <- data %>% 
  mutate(Status = case_when(Status == 50 ~ 1))

get_label(data$Status)
#> NULL

Created on 2023-05-23 with reprex v2.0.2

Could it be because case_when() is vectorized? And given that I wish to preserve labels, what workaround would you recommend? Falling back to replace()? Thank you.

Edit:

Linked to #5762. Which suggested recode() as the workaround. However, recode() has been superseded by case_match().

But case_match() does not seem to solve the issue:

suppressWarnings(suppressPackageStartupMessages(library(dplyr)))
suppressPackageStartupMessages(library(sjlabelled))

data <- readRDS(url("https://osf.io/d4mjk/?action=download"))

get_label(data$Status)
#>          Status 
#> "Response Type"

data <- data %>% 
  mutate(Status = case_match(Status, 1 ~ 50))

get_label(data$Status)
#> NULL

Created on 2023-05-23 with reprex v2.0.2

@DavisVaughan
Copy link
Member

DavisVaughan commented Jul 17, 2023

sjlabelled just adds a label attribute onto a double or integer vector. That isn't enough for our underlying infrastructure to be able to figure out that that information is important. It needs to use a real class like haven::labelled(). Then the information carries over as expected:

library(haven)
library(dplyr)

x <- labelled(c(1, 2, 1, 3, 2), label = "response type")
x
#> <labelled<double>[5]>: response type
#> [1] 1 2 1 3 2

# Doesn't ever "see" `x`. It gets a logical vector and `50`
case_when(x == 1 ~ 50)
#> [1] 50 NA 50 NA NA

# Explicitly tell `case_when()` to use `x` as the type
case_when(x == 1 ~ 50, .ptype = x)
#> <labelled<double>[5]>: response type
#> [1] 50 NA 50 NA NA

# Or supply `x` as the default if you just want to replace part of it
case_when(x == 1 ~ 50, .default = x)
#> <labelled<double>[5]>: response type
#> [1] 50  2 50  3  2

# Similar thing here
case_match(x, 1 ~ 50, .default = x)
#> <labelled<double>[5]>: response type
#> [1] 50  2 50  3  2

So I'd suggest switching to haven if you can. Or possibly manually converting the sjlabelled results over to haven.

@rempsyc
Copy link
Author

rempsyc commented Aug 7, 2023

I see, thank you. You will notice that I did not use sjlabelled to define labels, only to read them. The labels were produced with the qualtRics package. Reprex:

suppressWarnings(suppressPackageStartupMessages(library(qualtRics)))
suppressWarnings(suppressPackageStartupMessages(library(sjlabelled)))
suppressWarnings(suppressPackageStartupMessages(library(haven)))

# Extract all surveys
surveys <- all_surveys()

# # Identify right survey
survey1.id <- surveys$id[
  which("Projet priming-aggression (Part 1)_Study 3" == surveys$name)]

# # Fetch right survey
data <- suppressMessages(fetch_survey(surveyID = survey1.id, verbose = FALSE))

# sjlabelled works
get_label(data$Status)
#>          Status 
#> "Response Type"

# haven doesn't work
print_labels(data$Status)
#> Error in `print_labels()`:
#> ! `x` must be a labelled vector.
#> Backtrace:
#>     ▆
#>  1. └─haven::print_labels(data$Status)
#>  2.   └─cli::cli_abort("{.arg x} must be a labelled vector.")
#>  3.     └─rlang::abort(...)

Created on 2023-08-07 with reprex v2.0.2

Maybe @jmobrien from the qualtRics package can chip in on this?

@jmobrien
Copy link

jmobrien commented Aug 7, 2023

@rempsyc happy to discuss more over on the qualtRics page. We're aware but there's some history/complexity around labels & the labelled class, so for up to this point we've not added extended classes like labelled. There are a couple of workflows to get around it, though, that might serve your use case(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants