From 2daa97d17cccb909d7c938309811ea158ecc7578 Mon Sep 17 00:00:00 2001 From: Will Beasley Date: Tue, 11 Jun 2024 16:13:55 -0500 Subject: [PATCH] RStudio's formatting when I used the "visual" mode instead of the "source" mode of viewing --- vignettes/workflow-write.Rmd | 355 +++++++++++++++++++---------------- 1 file changed, 188 insertions(+), 167 deletions(-) diff --git a/vignettes/workflow-write.Rmd b/vignettes/workflow-write.Rmd index b02804f2..810f7cc7 100644 --- a/vignettes/workflow-write.Rmd +++ b/vignettes/workflow-write.Rmd @@ -5,8 +5,11 @@ output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Writing to a REDCap Project} - %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} + %\VignetteEngine{knitr::rmarkdown} +editor_options: + markdown: + wrap: 72 --- ```{r} @@ -18,62 +21,68 @@ knitr::opts_chunk$set( ) ``` -Writing data _to_ REDCap is more difficult than reading data _from_ REDCap. -When you read, you receive data in the structure that the REDCap provides you. -You have some control about the columns, rows, and data types, -but there is not a lot you have to be concerned. +Writing data *to* REDCap is more difficult than reading data *from* +REDCap. When you read, you receive data in the structure that the REDCap +provides you. You have some control about the columns, rows, and data +types, but there is not a lot you have to be concerned. -In contrast, the structure of the dataset you send to the REDCap server must be precise. -You need to pass special variables so that the REDCap server understands the -hierarchical structure of the data points. -This vignette walks you through that process. +In contrast, the structure of the dataset you send to the REDCap server +must be precise. You need to pass special variables so that the REDCap +server understands the hierarchical structure of the data points. This +vignette walks you through that process. -If you are new to REDCap and its API, -please first understand the concepts described in these two [vignettes](https://ouhscbbmc.github.io/REDCapR/articles/): +If you are new to REDCap and its API, please first understand the +concepts described in these two +[vignettes](https://ouhscbbmc.github.io/REDCapR/articles/): -* [Typical REDCap Workflow for a Data Analyst](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html) -* [Retrieving Longitudinal and Repeating Structures](https://ouhscbbmc.github.io/REDCapR/articles/longitudinal-and-repeating.html) +- [Typical REDCap Workflow for a Data + Analyst](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html) +- [Retrieving Longitudinal and Repeating + Structures](https://ouhscbbmc.github.io/REDCapR/articles/longitudinal-and-repeating.html) -Part 1 - Intro -=================================== +# Part 1 - Intro -Strategy ----------------------------------- +## Strategy -As described in the [Retrieving Longitudinal and Repeating Structures](https://ouhscbbmc.github.io/REDCapR/articles/longitudinal-and-repeating.html) vignette, -the best way to read and write data from projects with longitudinal/repeating elements -is to break up the "block matrix" dataset into individual datasets. -Each rectangle should have a coherent grain. +As described in the [Retrieving Longitudinal and Repeating +Structures](https://ouhscbbmc.github.io/REDCapR/articles/longitudinal-and-repeating.html) +vignette, the best way to read and write data from projects with +longitudinal/repeating elements is to break up the "block matrix" +dataset into individual datasets. Each rectangle should have a coherent +grain. -Following this strategy, we'll write to the REDCap server in two distinct steps: +Following this strategy, we'll write to the REDCap server in two +distinct steps: -1. Upload the patient-level instrument(s) -1. Upload the each repeating instrument separately. +1. Upload the patient-level instrument(s) +2. Upload the each repeating instrument separately. -The actual upload phase is pretty straight-forward ---it's just a call to `REDCapR::redcap_write()`. -Most of the vignette's code prepares the dataset so that the upload will run smoothly. +The actual upload phase is pretty straight-forward --it's just a call to +`REDCapR::redcap_write()`. Most of the vignette's code prepares the +dataset so that the upload will run smoothly. -Pre-requisites ----------------------------------- +## Pre-requisites -See the [Typical REDCap Workflow for a Data Analyst](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html) +See the [Typical REDCap Workflow for a Data +Analyst](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html) vignette and -1. [Verify REDCapR is installed](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#verify-redcapr-is-installed) -1. [Verify REDCap Access](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#verify-redcap-access) -1. [Review Codebook](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#review-codebook) +1. [Verify REDCapR is + installed](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#verify-redcapr-is-installed) +2. [Verify REDCap + Access](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#verify-redcap-access) +3. [Review + Codebook](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#review-codebook) -Retrieve Token -------------------------- +## Retrieve Token -Please closely read the -[Retrieve Protected Token](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#part-2---retrieve-protected-token) section, -which has important security implications. -The current vignette imports a fake dataset into REDCap, -and we'll use a token stored in a local file. +Please closely read the [Retrieve Protected +Token](https://ouhscbbmc.github.io/REDCapR/articles/workflow-read.html#part-2---retrieve-protected-token) +section, which has important security implications. The current vignette +imports a fake dataset into REDCap, and we'll use a token stored in a +local file. -```r +``` r # retrieve-credential path_credential <- system.file("misc/example.credentials", package = "REDCapR") credential <- REDCapR::retrieve_credential_local( @@ -84,21 +93,22 @@ credential <- REDCapR::retrieve_credential_local( c(credential$redcap_uri, credential$token) ``` -Datasets to Write to Server -------------------------- +## Datasets to Write to Server -To keep this vignette focused on writing/importing/uploading to the server, -we'll start with the data that needs to be written. -These example tables were prepared by [Raymond Balise](https://github.com/RaymondBalise) -for our 2023 [R/Medicine](https://events.linuxfoundation.org/r-medicine/) workshop, +To keep this vignette focused on writing/importing/uploading to the +server, we'll start with the data that needs to be written. These +example tables were prepared by [Raymond +Balise](https://github.com/RaymondBalise) for our 2023 +[R/Medicine](https://events.linuxfoundation.org/r-medicine/) workshop, "Using REDCap and R to Rapidly Produce Biomedical Publications". -There are two tables, each with a different [granularity](https://www.1keydata.com/datawarehousing/fact-table-granularity.html): +There are two tables, each with a different +[granularity](https://www.1keydata.com/datawarehousing/fact-table-granularity.html): -* `ds_patient`: each row represents one patient, -* `ds_daily`: each row represents one daily measurement per patient. +- `ds_patient`: each row represents one patient, +- `ds_daily`: each row represents one daily measurement per patient. -```r +``` r # load-patient ds_patient <- "test-data/vignette-repeating-write/data-patient.rds" |> @@ -108,7 +118,7 @@ ds_patient <- ds_patient ``` -```r +``` r # load-repeating ds_daily <- "test-data/vignette-repeating-write/data-daily.rds" |> @@ -118,38 +128,38 @@ ds_daily <- ds_daily ``` -Part 2 - Write Data: One row per patient -=================================== +# Part 2 - Write Data: One row per patient -Besides the [`data.frame`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/data.frame.html) -to write to REDCap, -the only required arguments of the +Besides the +[`data.frame`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/data.frame.html) +to write to REDCap, the only required arguments of the [`REDCapR::redcap_write()`](https://ouhscbbmc.github.io/REDCapR/reference/redcap_write.html) -function are `redcap_uri` and `token`; -both are contained in the credential object created in the previous section. +function are `redcap_uri` and `token`; both are contained in the +credential object created in the previous section. -As discussed in the [Troubleshooting vignette](https://ouhscbbmc.github.io/REDCapR/articles/TroubleshootingApiCalls.html#writing), -we recommend running these two preliminary checks before trying to write the -dataset to the server for the very first time. +As discussed in the [Troubleshooting +vignette](https://ouhscbbmc.github.io/REDCapR/articles/TroubleshootingApiCalls.html#writing), +we recommend running these two preliminary checks before trying to write +the dataset to the server for the very first time. -Prep: Stoplight Fields -------------------------- +## Prep: Stoplight Fields If the REDCap project isn't longitudinal and doesn't have arms, -uploading a patient-level data.frame to REDCap doesn't require adding variables. -However we typically populate the `*_complete` variables to communicate the record's status. +uploading a patient-level data.frame to REDCap doesn't require adding +variables. However we typically populate the `*_complete` variables to +communicate the record's status. -If the row is needs a human to add more values or inspect the existing values -consider [marking the instrument](https://ouhscbbmc.github.io/REDCapR/reference/constant.html) -"incomplete" or "unverified"; -the patient's instrument record will appear red or yellow in REDCap's Record Dashboard. -Otherwise consider marking the instrument "complete" so -it will appear green. +If the row is needs a human to add more values or inspect the existing +values consider [marking the +instrument](https://ouhscbbmc.github.io/REDCapR/reference/constant.html) +"incomplete" or "unverified"; the patient's instrument record will +appear red or yellow in REDCap's Record Dashboard. Otherwise consider +marking the instrument "complete" so it will appear green. -With this example project, the only patient-level instrument is "enrollment", -so the corresponding variable is `enrollment_complete`. +With this example project, the only patient-level instrument is +"enrollment", so the corresponding variable is `enrollment_complete`. -```r +``` r # patient-complete ds_patient <- ds_patient |> @@ -158,31 +168,32 @@ ds_patient <- ) ``` -Prep: `REDCapR::validate_for_write()` -------------------------- +## Prep: `REDCapR::validate_for_write()` -`REDCapR::validate_for_write()` inspects a data frame to anticipate potential problems before writing with REDCap's API. -A tibble is returned, with one row per potential problem (and a suggestion how to avoid it). -Ideally an 0-row tibble is returned. +`REDCapR::validate_for_write()` inspects a data frame to anticipate +potential problems before writing with REDCap's API. A tibble is +returned, with one row per potential problem (and a suggestion how to +avoid it). Ideally an 0-row tibble is returned. -```r +``` r REDCapR::validate_for_write(ds_patient, convert_logical_to_integer = TRUE) ``` -If you encounter problems that can be checked with automation, -please tell us in [an issue](https://github.com/OuhscBbmc/REDCapR/issues). -We'll work with you to incorporate the new check into `REDCapR::validate_for_write()`. +If you encounter problems that can be checked with automation, please +tell us in [an issue](https://github.com/OuhscBbmc/REDCapR/issues). +We'll work with you to incorporate the new check into +`REDCapR::validate_for_write()`. -When a dataset's problems are caught before reaching the server, -the solutions are easier to identify and implement. +When a dataset's problems are caught before reaching the server, the +solutions are easier to identify and implement. -Prep: Write Small Subset First -------------------------- +## Prep: Write Small Subset First -If this is your first time with a complicated project, consider loading a small subset of rows and columns. -In this case, we start with only three columns and two rows. +If this is your first time with a complicated project, consider loading +a small subset of rows and columns. In this case, we start with only +three columns and two rows. -```r +``` r # patient-subset ds_patient |> dplyr::select( # First three columns @@ -199,17 +210,18 @@ ds_patient |> ) ``` -Prep: Recode Variables where Necessary -------------------------- +## Prep: Recode Variables where Necessary -Some variables in the data.frame might be represented differently than in REDCap. +Some variables in the data.frame might be represented differently than +in REDCap. -A common transformation is changing strings into the integers that underlie radio buttons. -Common approaches are [`dplyr::case_match()`](https://dplyr.tidyverse.org/reference/case_match.html) and -using joining to lookup tables (if the mappings are expressed in a csv). -Here's an in-line example of `dplyr::case_match()`. +A common transformation is changing strings into the integers that +underlie radio buttons. Common approaches are +[`dplyr::case_match()`](https://dplyr.tidyverse.org/reference/case_match.html) +and using joining to lookup tables (if the mappings are expressed in a +csv). Here's an in-line example of `dplyr::case_match()`. -```r +``` r ds_patient <- ds_patient |> dplyr::mutate( @@ -233,20 +245,20 @@ ds_patient <- knitr::include_graphics("images/codebook-race.png") ``` -Write Entire Patient-level Table -------------------------- +## Write Entire Patient-level Table -If the small subset works, we usually jump ahead and try all columns and rows. +If the small subset works, we usually jump ahead and try all columns and +rows. -If this larger table fails, split the difference between -(a) the smaller working example and -(b) the larger failing example. -See if this middle point (that has fewer rows and/or columns than the failing point) -succeeds or fails. -Then repeat. -This "bisection" or "binary search" [debugging technique](https://medium.com/codecastpublication/debugging-tools-and-techniques-binary-search-2da5bb4282c7) is helpful in many areas of programming and statistical modeling. +If this larger table fails, split the difference between (a) the smaller +working example and (b) the larger failing example. See if this middle +point (that has fewer rows and/or columns than the failing point) +succeeds or fails. Then repeat. This "bisection" or "binary search" +[debugging +technique](https://medium.com/codecastpublication/debugging-tools-and-techniques-binary-search-2da5bb4282c7) +is helpful in many areas of programming and statistical modeling. -```r +``` r # patient-entire ds_patient |> REDCapR::redcap_write( @@ -257,29 +269,33 @@ ds_patient |> ) ``` -Part 3 - Write Data: Repeating Instrument -=================================== +# Part 3 - Write Data: Repeating Instrument -Add Plumbing Variables -------------------------- +## Add Plumbing Variables -As stated in the vignette's intro, -the structure of the dataset uploaded to the server must be precise. -When uploading repeating instruments, there are several important columns: +As stated in the vignette's intro, the structure of the dataset uploaded +to the server must be precise. When uploading repeating instruments, +there are several important columns: -1. `record_id`: typically indicates the patient's id. (This field can be renamed for the project.) -1. `redcap_event_name`: If the project is longitudinal or has arms, this indicates the event. - Otherwise, you don't need to add this variable. -1. `redcap_repeat_instrument`: Indicates the instrument/form that is repeating for these columns. -1. `redcap_repeat_instance`: Typically a sequential positive integer (*e.g.*, 1, 2, 3, ...) indicating the order. +1. `record_id`: typically indicates the patient's id. (This field can + be renamed for the project.) +2. `redcap_event_name`: If the project is longitudinal or has arms, + this indicates the event. Otherwise, you don't need to add this + variable. +3. `redcap_repeat_instrument`: Indicates the instrument/form that is + repeating for these columns. +4. `redcap_repeat_instance`: Typically a sequential positive integer + (*e.g.*, 1, 2, 3, ...) indicating the order. -The combination of these variables needs to be unique. -Please read the [Retrieving Longitudinal and Repeating Structures](https://ouhscbbmc.github.io/REDCapR/articles/longitudinal-and-repeating.html) +The combination of these variables needs to be unique. Please read the +[Retrieving Longitudinal and Repeating +Structures](https://ouhscbbmc.github.io/REDCapR/articles/longitudinal-and-repeating.html) vignette for details of these variables and their meanings. -You need to pass specific variables so that the REDCap server understands the hierarchical structure of the data points. +You need to pass specific variables so that the REDCap server +understands the hierarchical structure of the data points. -```r +``` r # repeat-plumbing ds_daily <- ds_daily |> @@ -305,10 +321,9 @@ REDCapR::validate_for_write(ds_daily, convert_logical_to_integer = TRUE) ds_daily ``` -Writing Repeating Instrument Variables -------------------------- +## Writing Repeating Instrument Variables -```r +``` r # daily-entire ds_daily |> REDCapR::redcap_write( @@ -319,59 +334,64 @@ ds_daily |> ) ``` -Part 4 - Next Steps -=================================== +# Part 4 - Next Steps -More Complexity -------------------------- +## More Complexity -This vignette required only two data.frames, but more complex projects sometimes need more. -For example, each repeating instrument should be its own data.frame and -writing step. Arms and longitudinal events need to be considered too. +This vignette required only two data.frames, but more complex projects +sometimes need more. For example, each repeating instrument should be +its own data.frame and writing step. Arms and longitudinal events need +to be considered too. -Batching -------------------------- +## Batching -By default, `REDCapR::redcap_write()` requests datasets of 100 patients as a time, -and stacks the resulting subsets together before returning a data.frame. -This can be adjusted to improve performance; -the 'Details' section of `REDCapR::redcap_write()` discusses the trade offs. +By default, `REDCapR::redcap_write()` requests datasets of 100 patients +as a time, and stacks the resulting subsets together before returning a +data.frame. This can be adjusted to improve performance; the 'Details' +section of `REDCapR::redcap_write()` discusses the trade offs. -I usually shoot for ~10 seconds per batch. +I usually shoot for \~10 seconds per batch. -Manual vs API -------------------------- +## Manual vs API -Manual downloading/uploading might make sense if you're do the operation only once. -But when does it ever stop after the first time? +Manual downloading/uploading might make sense if you're do the operation +only once. But when does it ever stop after the first time? -If you have trouble uploading, consider adding a few fake patients & measurements -and then download the csv. -It might reveal something you didn't anticipate. -But be aware that it will be in the block matrix format -(*i.e.*, everything jammed into one rectangle.) +If you have trouble uploading, consider adding a few fake patients & +measurements and then download the csv. It might reveal something you +didn't anticipate. But be aware that it will be in the block matrix +format (*i.e.*, everything jammed into one rectangle.) -Notes -=================================== +# Notes -This vignette was originally designed for the -[2023 R/Medicine](https://events.linuxfoundation.org/r-medicine/) workshop, -_Using REDCap and R to Rapidly Produce Biomedical Publications Cleaning Medical Data_ -with [Raymond R. Balise](https://github.com/RaymondBalise), Belén Hervera, Daniel Maya, Anna Calderon, Tyler Bartholomew, Stephan Kadauke, and João Pedro Carmezim Correia and the [2024 R/Medicine](https://rconsortium.github.io/RMedicine_website/Program.html) workshop, -_REDCap + R: Teaming Up in the Tidyverse_, with Stephan Kadauke. -The workshop slides are for [2023](https://github.com/RaymondBalise/r_med_redcap_2023_public) -and [2024](https://github.com/skadauke/rmedicine_2024_redcap_r_workshop). +This vignette was originally designed for the [2023 +R/Medicine](https://events.linuxfoundation.org/r-medicine/) workshop, +*Using REDCap and R to Rapidly Produce Biomedical Publications Cleaning +Medical Data* with [Raymond R. +Balise](https://github.com/RaymondBalise), Belén Hervera, Daniel Maya, +Anna Calderon, Tyler Bartholomew, Stephan Kadauke, and João Pedro +Carmezim Correia and the [2024 +R/Medicine](https://rconsortium.github.io/RMedicine_website/Program.html) +workshop, *REDCap + R: Teaming Up in the Tidyverse*, with Stephan +Kadauke. The workshop slides are for +[2023](https://github.com/RaymondBalise/r_med_redcap_2023_public) and +[2024](https://github.com/skadauke/rmedicine_2024_redcap_r_workshop). -This work was made possible in part by the NIH grant [U54GM104938](https://taggs.hhs.gov/Detail/AwardDetail?arg_AwardNum=U54GM104938&arg_ProgOfficeCode=127) -to the [Oklahoma Shared Clinical and Translational Resource)](http://osctr.ouhsc.edu). +This work was made possible in part by the NIH grant +[U54GM104938](https://taggs.hhs.gov/Detail/AwardDetail?arg_AwardNum=U54GM104938&arg_ProgOfficeCode=127) +to the [Oklahoma Shared Clinical and Translational +Resource)](http://osctr.ouhsc.edu). -Session Information -================================================================== +# Session Information -For the sake of documentation and reproducibility, the current report was rendered in the following environment. Click the line below to expand. +For the sake of documentation and reproducibility, the current report +was rendered in the following environment. Click the line below to +expand.
- Environment + +Environment + ```{r session-info, echo=FALSE} if (requireNamespace("sessioninfo", quietly = TRUE)) { sessioninfo::session_info() @@ -379,4 +399,5 @@ if (requireNamespace("sessioninfo", quietly = TRUE)) { sessionInfo() } ``` +