Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create raw clinsightful_data #162

Merged
merged 26 commits into from
Feb 18, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
220aff7
Start creating script to reverse clinsightful_data to raw format
LDSamson Jan 31, 2025
573b374
fix limits
LDSamson Jan 31, 2025
74fdffa
add test app script
LDSamson Jan 31, 2025
eac625f
Fix vital signs
LDSamson Jan 31, 2025
f1d4f65
Generalize region extraction from sitecode
LDSamson Jan 31, 2025
494d56c
fix for vital signs and calculated values (weight change)
LDSamson Jan 31, 2025
fac95c9
fix units and remove lab vars where applicable
LDSamson Jan 31, 2025
1ca2ab3
more realistic site codes
LDSamson Jan 31, 2025
2abfac3
Verify results, add some comments
LDSamson Jan 31, 2025
3978bb6
Merge branch 'dev' into ls_105_create_raw_clinsight_data
LDSamson Jan 31, 2025
0f515f9
rearrange data wrangling to make datasets comparable
LDSamson Jan 31, 2025
e85ddd3
Update clinsightful_data with the version that can be derived from ra…
LDSamson Jan 31, 2025
c3698da
Update tests
LDSamson Jan 31, 2025
a04abea
fix mod_study_forms tests, include js resources to test app
LDSamson Jan 31, 2025
20009ad
Update snap
LDSamson Jan 31, 2025
f843b1f
Fix test diff due to inconsistency in old dataset
LDSamson Jan 31, 2025
a8364cb
fix adding synch time
LDSamson Jan 31, 2025
fc2a6af
Update so that raw csv files can be created
LDSamson Jan 31, 2025
87200e6
Add raw data and as script to recreate clinsightful_data
LDSamson Jan 31, 2025
02aa5d0
Add final updated clinsightful_data
LDSamson Jan 31, 2025
44629f1
Fix some tests
LDSamson Jan 31, 2025
2523ca5
Merge branch 'ls_create_clinsightful_data_from_scratch' into ls_105_c…
LDSamson Jan 31, 2025
814e1b5
Properly wrap the new test in an if statement
LDSamson Jan 31, 2025
f46b10a
Update the data again after resolving merge conflict
LDSamson Jan 31, 2025
18faec4
Update snap
LDSamson Jan 31, 2025
dd4a74a
Add news, increment version number
LDSamson Jan 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: clinsight
Title: ClinSight
Version: 0.1.1.9019
Version: 0.1.1.9020
Authors@R: c(
person("Leonard Daniël", "Samson", , "[email protected]", role = c("cre", "aut"),
comment = c(ORCID = "0000-0002-6252-7639")),
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
- Added `Excel` download button to Queries table & patient listings that need review.
- (For developers) From now on,the new Chrome headless browser mode will be used for `shinytest2` tests so that unit tests can be run with Chrome v132.
- The interactive timeline now has more consistent labels, will center an item on click, and has customizable treatment labels (by setting `settings$treatment_label` in the metadata).
- (For developers) added raw data that can be used to completely recreate the internal dataset (`clinsightful_data`) with the merge functions in the package.

## Bug fixes

Expand Down
24 changes: 10 additions & 14 deletions R/fct_appdata.R
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ get_raw_csv_data <- function(
if(identical(synch_time, "")){warning("No synch time provided")}
cat("Adding synch time '", synch_time, "' as the attribute 'synch_time'",
"to the data set.\n")
attr(raw_data, "synch_time") <- "synch_time"
attr(raw_data, "synch_time") <- synch_time
raw_data
}

Expand Down Expand Up @@ -99,8 +99,15 @@ merge_meta_with_data <- function(
"item_unit" = unit,
"item_value" = VAL
) |>
dplyr::mutate(region = region %|_|% "Missing") |>
apply_custom_functions(meta$settings$post_merge_fns)
apply_custom_functions(meta$settings$post_merge_fns) |>
dplyr::mutate(
region = region %|_|% ifelse(
is.na(site_code),
"Missing",
gsub("_*[[:digit:]]+$", "", site_code)
)
)

attr(merged_data, "synch_time") <- synch_time
merged_data
}
Expand Down Expand Up @@ -212,17 +219,6 @@ apply_study_specific_fixes <- function(
),
.by = c(subject_id, form_repeat)
)

# Add regions:
data |>
dplyr::mutate(
region = dplyr::case_when(
grepl("^AU", site_code) ~ "AUS",
grepl("^DE", site_code) ~ "GER",
grepl("^FR", site_code) ~ "FRA",
TRUE ~ NA_character_
)
)
}

#' Apply custom modification functions
Expand Down
9 changes: 9 additions & 0 deletions data-raw/clinsightful_data.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
devtools::load_all(".")

clinsightful_data <- clinsight::get_raw_csv_data(
app_sys("raW_data"),
synch_time = "2023-09-15 10:10:00 UTC"
) |>
merge_meta_with_data(metadata)

usethis::use_data(clinsightful_data, overwrite = TRUE)
Binary file modified data/clinsightful_data.rda
Binary file not shown.
155 changes: 155 additions & 0 deletions dev/create_raw_clinsight_data.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
### Script to reverse-engineer raw data from the current clinsightful_data object
devtools::load_all()
library(dplyr)
library(tidyr)

clinsight_names <- metadata$column_names

varnames <- metadata$items_expanded |>
distinct(var, item_name) |>
filter(
n() == 1 | grepl("LBORRES$|VSORRES$", var),
.by = item_name
)

# Variables not in clinsightful_data but are mentioned in metadata.xlsx:
# dplyr::anti_join(varnames, clinsightful_data, by = "item_name")
# var item_name
# 1 AE_AESER_AEOUT SAE outcome
# 2 WHO_WHOCAT WHO.subclassification
# 3 DMOD_DAT DoseModificationDate
# 4 DMOD_REAS DoseModificationReason
# 5 DMOD_DOSE DoseModificationNewDose

# All items in clinsightful_data are also in metadata, as expected:
# dplyr::anti_join(clinsightful_data, varnames, by = "item_name")
# # A tibble: 0 × 24

cd_new <- dplyr::left_join(clinsightful_data, varnames, by = "item_name")

labvars <- c(
"LBORRES" = "item_value",
"LBORNR_Lower" = "lower_lim",
"LBORNR_Upper" = "upper_lim",
"LBORRESU" = "item_unit",
"LBCLSIG" = "significance",
"LBREASND" = "reason_notdone"
)

lab_data <- cd_new |>
filter(grepl("_LBORRES$|VSORRES", var)) |>
rename(all_of(labvars)) |>
mutate(
var = gsub("_VSORRES$|_LBORRES$", "", var)
) |>
mutate(across(all_of(names(labvars)), as.character)) |>
pivot_longer(
all_of(names(labvars)),
names_to = "suffix", values_to = "item_value"
) |>
filter(
# since other vars in vital signs do not exist in the data:
!(item_group == "Vital signs" & !suffix %in% c("LBORRES", "LBREASND")),
# remove derived vars:
!var == "VS_WEIGHTCHANGE"
) |>
mutate(
suffix = ifelse(item_group == "Vital signs" & suffix == "LBORRES", "VSORRES", suffix),
suffix = ifelse(item_group == "Vital signs" & suffix == "LBREASND", "VSREASND", suffix),
var = ifelse(is.na(suffix), var, paste0(var, "_", suffix))
)

other_data <- cd_new |>
filter(!grepl("_LBORRES$|VSORRES", var)) |>
# keep item_value of course, but remove other lab values:
select(-all_of(labvars[-1]))

all_data <- dplyr::bind_rows(other_data, lab_data) |>
# remove columns that are created during metadata merging:
select(-region, -suffix, -db_update_time, -day, -vis_day,
-vis_num, -item_type, -item_group, -region) |>
mutate(
item_value = ifelse(item_value == "(unit missing)", NA_character_, item_value),
) |>
# revert names
rename(
c(
all_of(setNames(clinsight_names$name_new, clinsight_names$name_raw)),
"ItemName" = "item_name",
"EventName" = "event_name",
"EventLabel" = "event_label"
)
) |>
# create more real-world site codes:
mutate(
SiteCode = gsub("_[[:digit:]]+$", "", SubjectId),
SiteCode = sub("_", "", SiteCode)
)

#### Verify if outcome is the same as clinsightful_data after merging:
merged_data <- all_data |>
merge_meta_with_data(meta = metadata)
attr(merged_data, "synch_time") <- "2023-09-15 10:10:00 UTC"

###### Check what is needed to get exactly the same dataset:
######

old_clinsight_data <- clinsightful_data |>
##### db_update_time is not used anymore and should be removed from
##### clinsightful_data:
select(-db_update_time) |>
##### more realistic site codes needed:
mutate(
site_code = simplify_string(gsub("_[[:digit:]]+$", "", simplify_string(subject_id))),
site_code = toupper(sub("_", "", site_code))
) |>
# because day and vis_day are incorrect in the current clinsight_data:
select(-day, -vis_day) |>
# because old weight change since screening was incorrect.
# In the new dataset it is actually calculated:
filter(item_name != "Weight change since screening") |>
arrange(site_code, subject_id, item_group, item_name, event_date, event_repeat)

new_clinsight_data <- merged_data |>
select(c(any_of(names(clinsightful_data)), "form_type")) |>
select(-form_type, -day, -vis_day) |>
filter(item_name != "Weight change since screening") |>
# same column order makes comparisons easier
arrange(site_code, subject_id, item_group, item_name, event_date, event_repeat)

waldo::compare(old_clinsight_data, new_clinsight_data)

raw_data <- split(all_data, ~FormId)

######### Create raw data files
#########

lapply(
names(raw_data),
\(x) {
# To mimic data with two name rows (usually a long and a short name)
df <- as.data.frame(lapply(raw_data[[x]], as.character))
df <- df[c(1, 1:nrow(df)),]
df[1,]<- as.list(names(raw_data[[x]]))
names(df) <- simplify_string(names(df))
readr::write_csv(
df,
file = file.path(app_sys("raw_data"), paste0("clinsight_raw_", x, ".csv"))
)
}
)
merged_clinsight_data <- clinsight::get_raw_csv_data(
app_sys("raW_data"),
synch_time = "2023-09-15 10:10:00 UTC"
) |>
merge_meta_with_data(metadata)

# waldo::compare(
# merged_clinsight_data |>
# arrange(site_code, subject_id, item_group, item_name, event_date, event_repeat),
# merged_data |>
# arrange(site_code, subject_id, item_group, item_name, event_date, event_repeat)
# )
# No difference apart from the order

## Use data-raw/clinsightful_data.R for recreating clinsightful_data from raw data
2 changes: 1 addition & 1 deletion inst/golem-config.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
default:
golem_name: clinsight
golem_version: 0.1.1.9019
golem_version: 0.1.1.9020
app_prod: no
user_identification: test_user
study_data: !expr clinsight::clinsightful_data
Expand Down
Loading