recapFactorFlip equivalent #390

Wunsei · 2022-05-01T15:57:47Z

redcapAPI has the helpful redcapFactorFlip function. Does REDCapR have an equivalent? If no, how might I replicate the same thing?

The text was updated successfully, but these errors were encountered:

wibeasley · 2022-08-14T21:34:24Z

yeah, @nutterb has done some cool things in redcapAPI.

Here's the code for his function: https://github.com/nutterb/redcapAPI/blob/master/R/redcapFactorFlip.R. I haven't studied it too closely, but it looks like he gets the raw values & labels and makes them factors. Here are the parts that strike me as most relevant: https://github.com/nutterb/redcapAPI/search?q=redcapLabels

Regarding how to replicate it in REDCapR, I'm guess it would start with REDCapR::redcap_metadata_read() --run that example from your own machine to see get a feel for the returned dataset. The selection_choice_or_calculations column exposes values like "0, Female | 1, Male" for gender and "0, Unknown / Not Reported | 1, NOT Hispanic or Latino | 2, Hispanic or Latino" for ethnicity. That could be split at each pipe (ie, |), and a regex could pull out the number and the value.

Here's a proof of concept that I haven't tested. Suppose the relevant parts of the metadata dataset are:

ds <- 
  tibble::tribble(
    ~field_name, ~select_choices_or_calculations,
    "record_id", NA_character_,
    "age"      , NA_character_,
    "gender"   , "0, Female | 1, Male",
    "race"     , "1, American Indian/Alaska Native | 2, Asian | 3, Native Hawaiian or Other Pacific Islander | 4, Black or African American | 5, White | 6, Unknown / Not Reported",
    "ethnicity", "0, Unknown / Not Reported | 1, NOT Hispanic or Latino | 2, Hispanic or Latino" 
)

Something like this would extract each level. Notice rematch2, which is one of my favorite & underappreciated packages, does the hard work.

pattern <- "^(?<level>\\d+),(?<label>.+)$"
ds |> 
  dplyr::select(
    field     = field_name,
    choice    = select_choices_or_calculations
  ) |> 
  tidyr::drop_na(choice) |> 
  tidyr::separate_rows(
    choice, 
    sep     = " \\| "
  ) |> 
  rematch2::bind_re_match(
    choice, 
    pattern
  )

Here's resulting dataset, which (I think) can be piped into a purrr function to apply the levels & labels to each factor variable.

       field                                       choice level                                      label
1     gender                                    0, Female     0                                     Female
2     gender                                      1, Male     1                                       Male
3       race             1, American Indian/Alaska Native     1              American Indian/Alaska Native
4       race                                     2, Asian     2                                      Asian
5       race 3, Native Hawaiian or Other Pacific Islander     3  Native Hawaiian or Other Pacific Islander
6       race                 4, Black or African American     4                  Black or African American
7       race                                     5, White     5                                      White
8       race                    6, Unknown / Not Reported     6                     Unknown / Not Reported
9  ethnicity                    0, Unknown / Not Reported     0                     Unknown / Not Reported
10 ethnicity                    1, NOT Hispanic or Latino     1                     NOT Hispanic or Latino
11 ethnicity                        2, Hispanic or Latino     2                         Hispanic or Latino

Does that help? If there's enough interest, I'll put it into the package. I'd love fed back from anyone who is interested in this potential. @pbchase, you usually have an opinion?

skadauke · 2022-08-15T11:13:53Z

I also think being able to move between raw data and labels would be useful. We built the parse_labels function that takes a string from selection_choice_or_calculations and returns a tibble:

parse_labels <- function(string){
  out <- string %>%
    strsplit(" \\| |, ") %>% # split either by ' | ' or ', '
    unlist() %>%
    matrix(
      ncol = 2,
      byrow = TRUE,
      dimnames = list(
        c(),               # row names
        c("raw", "label")) # column names
    ) %>%
    dplyr::as_tibble()
  
  out
}

string <- "1, American Indian/Alaska Native | 2, Asian | 3, Native Hawaiian or Other Pacific Islander | 4, Black or African American | 5, White | 6, Unknown / Not Reported"

parse_labels(string)
#> # A tibble: 6 × 2
#>   raw   label                                    
#>   <chr> <chr>                                    
#> 1 1     American Indian/Alaska Native            
#> 2 2     Asian                                    
#> 3 3     Native Hawaiian or Other Pacific Islander
#> 4 4     Black or African American                
#> 5 5     White                                    
#> 6 6     Unknown / Not Reported

This function should work well in a purrr context.

One improvement I would suggest to your above code is not to filter on choice == NA but on categorical field types (dropdown, radio, checkbox, +/- yesno, +/- truefalse) because that field is also used to report calculations for calculated field and I'm assuming you don't want those to go into your result set.

skadauke mentioned this issue Aug 15, 2022

[FEATURE] Implement a function to return project-wide metadata CHOP-CGTInformatics/REDCapTidieR#14

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recapFactorFlip equivalent #390

recapFactorFlip equivalent #390

Wunsei commented May 1, 2022

wibeasley commented Aug 14, 2022

skadauke commented Aug 15, 2022

recapFactorFlip equivalent #390

recapFactorFlip equivalent #390

Comments

Wunsei commented May 1, 2022

wibeasley commented Aug 14, 2022

skadauke commented Aug 15, 2022