diff --git a/articles/a01_building_base_cohorts.html b/articles/a01_building_base_cohorts.html index 8c059e7a..3b231329 100644 --- a/articles/a01_building_base_cohorts.html +++ b/articles/a01_building_base_cohorts.html @@ -117,7 +117,7 @@
--This package is currently experimental. Please use with care and report any issues you might come across.
-
The goal of CohortConstructor is to support the creation and manipulation of cohorts in the OMOP Common Data Model.
+The goal of CohortConstructor is to support the creation and manipulation of study cohorts in data mapped to the OMOP CDM.
You can install the development version of CohortConstructor from GitHub with:
+The package can be installed from CRAN:
+install.packages("CohortConstructor")
Or you can install the development version of CohortConstructor from GitHub:
+
# install.packages("devtools")
devtools::install_github("ohdsi/CohortConstructor")
To illustrate how the functionality let’s create a CDM reference for the Eunomia dataset Using the CDMConnector package.
-
-library(CDMConnector)
+To illustrate how the functionality let’s create a set of fracture cohorts using the Eunomia dataset. We’ll first load required packages and create a cdm reference for the data.
+
+library(omopgenerics)
+library(CDMConnector)
library(PatientProfiles)
library(dplyr)
library(CohortConstructor)
-
-con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())
+library(CohortCharacteristics)
+
+con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())
cdm <- cdm_from_con(con, cdm_schema = "main",
write_schema = c(prefix = "my_study_", schema = "main"))
-print(cdm)
+cdm
+#>
+#> ── # OMOP CDM reference (duckdb) of Synthea synthetic health database ──────────
+#> • omop tables: person, observation_period, visit_occurrence, visit_detail,
+#> condition_occurrence, drug_exposure, procedure_occurrence, device_exposure,
+#> measurement, observation, death, note, note_nlp, specimen, fact_relationship,
+#> location, care_site, provider, payer_plan_period, cost, drug_era, dose_era,
+#> condition_era, metadata, cdm_source, concept, vocabulary, domain,
+#> concept_class, concept_relationship, relationship, concept_synonym,
+#> concept_ancestor, source_to_concept_map, drug_strength
+#> • cohort tables: -
+#> • achilles tables: -
+#> • other tables: -
We start by making a concept based cohort. For this we only need to provide concept sets and we will get a cohort back, with cohort end date the event date associated with the records, overlapping records collapsed, and only records in observation kept.
-+We will start by making a simple concept-based cohort for each of our fracture of interest. First we create a codelist for each ankle, forearm and hip fractures (note, we just use one code for each because we are using synthetic data).
+++fracture_codes <- newCodelist(list("ankle_fracture" = 4059173L, + "forearm_fracture" = 4278672L, + "hip_fracture" = 4230399L)) +fracture_codes +#> +#> ── 3 codelists ───────────────────────────────────────────────────────────────── +#> +#> - ankle_fracture (1 codes) +#> - forearm_fracture (1 codes) +#> - hip_fracture (1 codes)
Now we can quickly create a set of cohorts for each fracture type. For this we only need to provide the codes we have defined and we will get a cohort back, with cohort end date set as the event date associated with the records, overlapping records collapsed, and only records in observation kept.
+cdm$fractures <- cdm |> - conceptCohort(conceptSet = list( - "ankle_fracture" = 4059173, - "forearm_fracture" = 4278672, - "hip_fracture" = 4230399), + conceptCohort(conceptSet = fracture_codes, name = "fractures")
We can see that our starting cohorts, before we add any additional restrictions, have the following associated settings, counts, and attrition.
-+settings(cdm$fractures) %>% glimpse() #> Rows: 3 #> Columns: 2 @@ -125,20 +148,64 @@
Generating concept based cohorts#> $ excluded_subjects <int> 0, 0, 0
++Create an overall fracture cohort +
+So far we have created three separate fracture cohorts. Let’s say we also want a cohort of people with any of the fractures. We could union our three cohorts to create this overall cohort like so:
+++cdm$any_fracture <- cdm$fractures |> + CohortConstructor::unionCohorts(cohortName = "any_fracture", name = "any_fracture") +#> Warning: ! 1 casted column in any_fracture (cohort_set) as do not match expected column +#> type: +#> • `cohort_definition_id` from numeric to integer +#> Warning: ! 6 casted column in any_fracture (cohort_attrition) as do not match expected +#> column type: +#> • `cohort_definition_id` from numeric to integer +#> • `number_records` from numeric to integer +#> • `number_subjects` from numeric to integer +#> • `reason_id` from numeric to integer +#> • `excluded_records` from numeric to integer +#> • `excluded_subjects` from numeric to integer +#> Warning: ! 1 casted column in any_fracture (cohort_codelist) as do not match expected +#> column type: +#> • `cohort_definition_id` from numeric to integer +#> Warning: ! 1 column in any_fracture do not match expected column type: +#> • `cohort_definition_id` is numeric but expected integer +cdm <- bind(cdm$fractures, + cdm$any_fracture, + name = "fractures")
++settings(cdm$fractures) +#> # A tibble: 4 × 3 +#> cohort_definition_id cohort_name gap +#> <int> <chr> <dbl> +#> 1 1 ankle_fracture NA +#> 2 2 forearm_fracture NA +#> 3 3 hip_fracture NA +#> 4 4 any_fracture 0 +cohortCount(cdm$fractures) +#> # A tibble: 4 × 3 +#> cohort_definition_id number_records number_subjects +#> <int> <int> <int> +#> 1 1 464 427 +#> 2 2 569 510 +#> 3 3 138 132 +#> 4 4 1171 924
Require in date range
-Once we have created our base cohort, we can then start applying additional cohort requirements. For example, first we can require that individuals’ cohort start date fall within a certain date range.
-diff --git a/reference/entryAtLastDate.html b/reference/entryAtLastDate.html index eb934109..505b9165 100644 --- a/reference/entryAtLastDate.html +++ b/reference/entryAtLastDate.html @@ -169,10 +169,10 @@+Once we have created our base fracture cohort, we can then start applying additional cohort requirements. For example, first we can require that individuals’ cohort start date fall within a certain date range.
+cdm$fractures <- cdm$fractures %>% requireInDateRange(dateRange = as.Date(c("2000-01-01", "2020-01-01")))
Now that we’ve applied this date restriction, we can see that our cohort attributes have been updated
-diff --git a/pkgdown.yml b/pkgdown.yml index 6fccd7e5..0bb425fd 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -7,7 +7,7 @@ articles: a02_applying_cohort_restrictions: a02_applying_cohort_restrictions.html a03_age_sex_matching: a03_age_sex_matching.html a04_cohort_manipulations: a04_cohort_manipulations.html -last_built: 2024-08-27T09:07Z +last_built: 2024-08-27T12:15Z urls: reference: https://ohdsi.github.io/CohortConstructor/reference article: https://ohdsi.github.io/CohortConstructor/articles diff --git a/reference/conceptCohort.html b/reference/conceptCohort.html index 48ccd041..7ffae571 100644 --- a/reference/conceptCohort.html +++ b/reference/conceptCohort.html @@ -146,6 +146,7 @@+cohort_count(cdm$fractures) %>% glimpse() -#> Rows: 3 +#> Rows: 4 #> Columns: 3 -#> $ cohort_definition_id <int> 1, 2, 3 -#> $ number_records <int> 108, 152, 62 -#> $ number_subjects <int> 104, 143, 60 +#> $ cohort_definition_id <int> 1, 2, 3, 4 +#> $ number_records <int> 108, 152, 62, 322 +#> $ number_subjects <int> 104, 143, 60, 287 attrition(cdm$fractures) %>% filter(reason == "cohort_start_date between 2000-01-01 & 2020-01-01") %>% glimpse() @@ -156,85 +223,71 @@
Require in date rangeApplying demographic requirements
We can also add restrictions on patient characteristics such as age (on cohort start date by default) and sex.
-+cdm$fractures <- cdm$fractures %>% requireDemographics(ageRange = list(c(40, 65)), sex = "Female")
Again we can see how many individuals we’ve lost after applying these criteria.
-++#> $ excluded_records <int> 24, 27, 10, 61 +#> $ excluded_subjects <int> 24, 26, 10, 57attrition(cdm$fractures) %>% filter(reason == "Age requirement: 40 to 65") %>% glimpse() -#> Rows: 3 +#> Rows: 4 #> Columns: 7 -#> $ cohort_definition_id <int> 1, 2, 3 -#> $ number_records <int> 43, 64, 22 -#> $ number_subjects <int> 43, 62, 22 -#> $ reason_id <int> 4, 4, 4 +#> $ cohort_definition_id <int> 1, 2, 3, 4 +#> $ number_records <int> 43, 64, 22, 129 +#> $ number_subjects <int> 43, 62, 22, 122 +#> $ reason_id <int> 4, 4, 4, 4 #> $ reason <chr> "Age requirement: 40 to 65", "Age requirement: 40… -#> $ excluded_records <int> 65, 88, 40 -#> $ excluded_subjects <int> 61, 81, 38 +#> $ excluded_records <int> 65, 88, 40, 193 +#> $ excluded_subjects <int> 61, 81, 38, 165 attrition(cdm$fractures) %>% filter(reason == "Sex requirement: Female") %>% glimpse() -#> Rows: 3 +#> Rows: 4 #> Columns: 7 -#> $ cohort_definition_id <int> 1, 2, 3 -#> $ number_records <int> 19, 37, 12 -#> $ number_subjects <int> 19, 36, 12 -#> $ reason_id <int> 5, 5, 5 +#> $ cohort_definition_id <int> 1, 2, 3, 4 +#> $ number_records <int> 19, 37, 12, 68 +#> $ number_subjects <int> 19, 36, 12, 65 +#> $ reason_id <int> 5, 5, 5, 5 #> $ reason <chr> "Sex requirement: Female", "Sex requirement: Fema… -#> $ excluded_records <int> 24, 27, 10 -#> $ excluded_subjects <int> 24, 26, 10
Require presence in another cohort
-We can also require that individuals are in another cohort over some window. Here for example we require that study participants are in a GI bleed cohort any time prior up to their entry in the fractures cohort.
-++We can also require that individuals are (or are not) in another cohort over some window. Here for example we require that study participants are in a GI bleed cohort any time prior up to their entry in the fractures cohort.
+-cdm$gibleed <- cdm |> - conceptCohort(conceptSet = list("gibleed" = 192671), + conceptCohort(conceptSet = list("gibleed" = 192671L), name = "gibleed") cdm$fractures <- cdm$fractures %>% requireCohortIntersect(targetCohortTable = "gibleed", - window = c(-Inf, 0))
++ intersections = 0, + window = c(-Inf, 0))+#> $ cohort_definition_id <int> 1, 2, 3, 4 +#> $ number_records <int> 14, 30, 10, 54 +#> $ number_subjects <int> 14, 30, 10, 52 +#> $ reason_id <int> 8, 8, 8, 8 +#> $ reason <chr> "Not in cohort gibleed between -Inf & 0 days rela… +#> $ excluded_records <int> 5, 7, 2, 14 +#> $ excluded_subjects <int> 5, 6, 2, 13attrition(cdm$fractures) %>% - filter(reason == "In cohort gibleed between -Inf & 0 days relative to cohort_start_date") %>% + filter(reason == "Not in cohort gibleed between -Inf & 0 days relative to cohort_start_date") %>% glimpse() -#> Rows: 3 +#> Rows: 4 #> Columns: 7 -#> $ cohort_definition_id <int> 1, 2, 3 -#> $ number_records <int> 5, 7, 2 -#> $ number_subjects <int> 5, 6, 2 -#> $ reason_id <int> 8, 8, 8 -#> $ reason <chr> "In cohort gibleed between -Inf & 0 days relative… -#> $ excluded_records <int> 14, 30, 10 -#> $ excluded_subjects <int> 14, 30, 10
+cdmDisconnect(cdm)
-Combining cohorts +
More information
-Currently we have separate fracture cohorts.
-Let’s say we want to create a cohort of people with any of the fractures. We could create this cohort like so:
---cdm$fractures <- cdm$fractures |> - CohortConstructor::unionCohorts() - -settings(cdm$fractures) -#> # A tibble: 1 × 3 -#> cohort_definition_id cohort_name gap -#> <dbl> <chr> <dbl> -#> 1 1 ankle_fracture_forearm_fracture_hip_fracture 0 -cohortCount(cdm$fractures) -#> # A tibble: 1 × 3 -#> cohort_definition_id number_records number_subjects -#> <int> <int> <int> -#> 1 1 14 13
+-cdmDisconnect(cdm)
CohortConstructor provides much more functionality for creating and manipulating cohorts. See the package website for more details.
Examples#> Note: method with signature ‘DBIConnection#Id’ chosen for function ‘dbExistsTable’, #> target signature ‘duckdb_connection#Id’. #> "duckdb_connection#ANY" would also be valid +#> ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 90% | ETA: 0s cohort <- conceptCohort(cdm = cdm, conceptSet = list(a = 1), name = "cohort") #> Warning: ! `codelist` contains numeric values, they are casted to integers. diff --git a/reference/entryAtFirstDate.html b/reference/entryAtFirstDate.html index d7c4ab54..2fab3b05 100644 --- a/reference/entryAtFirstDate.html +++ b/reference/entryAtFirstDate.html @@ -172,9 +172,9 @@
Examples#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1025-azure:R 4.4.1/:memory:] #> cohort_definition_id subject_id cohort_start_date cohort_end_date entry_reason #> <dbl> <dbl> <date> <date> <chr> -#> 1 1 1 2001-08-01 2001-09-01 date_1; dat… +#> 1 1 2 2001-01-01 2001-01-12 date_1 #> 2 1 3 2015-02-14 2015-02-15 date_2 -#> 3 1 2 2001-01-01 2001-01-12 date_1 +#> 3 1 1 2001-08-01 2001-09-01 date_1; dat… #> 4 1 4 2002-12-09 2002-12-09 date_1; dat… # }
Examples#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1025-azure:R 4.4.1/:memory:] #> cohort_definition_id subject_id cohort_start_date cohort_end_date entry_reason #> <dbl> <dbl> <date> <date> <chr> -#> 1 1 1 2001-08-01 2001-09-01 date_2; dat… -#> 2 1 3 2015-02-14 2015-02-15 date_2 -#> 3 1 4 2002-12-09 2002-12-09 date_2; dat… -#> 4 1 2 2001-01-01 2001-01-12 date_1 +#> 1 1 2 2001-01-01 2001-01-12 date_1 +#> 2 1 4 2002-12-09 2002-12-09 date_2; dat… +#> 3 1 3 2015-02-14 2015-02-15 date_2 +#> 4 1 1 2001-08-01 2001-09-01 date_2; dat… # }
# \donttest{
library(CohortConstructor)
cdm <- mockCohortConstructor(drugExposure = TRUE)
-#> ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 90% | ETA: 0s
cdm$cohort1 |>
requireTableIntersect(tableName = "drug_exposure",
indexDate = "cohort_start_date",
diff --git a/reference/sampleCohorts.html b/reference/sampleCohorts.html
index 490ff854..43f6cbf7 100644
--- a/reference/sampleCohorts.html
+++ b/reference/sampleCohorts.html
@@ -90,16 +90,16 @@ Examples#> # Database: DuckDB v1.0.0 [unknown@Linux 6.5.0-1025-azure:R 4.4.1/:memory:]
#> cohort_definition_id subject_id cohort_start_date cohort_end_date
#> <int> <int> <date> <date>
-#> 1 1 1 2001-02-15 2001-10-28
-#> 2 1 1 2001-10-29 2003-03-15
-#> 3 1 1 2003-03-16 2005-02-08
-#> 4 1 2 1999-11-11 2002-04-09
-#> 5 1 28 2005-04-01 2005-04-03
-#> 6 1 28 2005-04-04 2005-08-23
-#> 7 1 48 1964-01-15 1966-03-08
-#> 8 1 48 1966-03-09 1966-08-01
-#> 9 1 48 1966-08-02 1967-04-16
-#> 10 1 48 1967-04-17 1974-01-28
+#> 1 1 6 2011-03-07 2014-01-24
+#> 2 1 7 2014-03-08 2014-03-19
+#> 3 1 14 1986-11-29 1987-12-14
+#> 4 1 14 1987-12-15 1988-03-19
+#> 5 1 15 1978-07-09 2000-06-12
+#> 6 1 22 1993-09-22 1994-04-10
+#> 7 1 22 1994-04-11 1996-04-26
+#> 8 1 22 1996-04-27 1999-08-31
+#> 9 1 59 2014-08-29 2015-04-10
+#> 10 1 59 2015-04-11 2016-01-05
#> # ℹ more rows
# }
diff --git a/reference/trimDemographics.html b/reference/trimDemographics.html
index fd323bf0..03453dda 100644
--- a/reference/trimDemographics.html
+++ b/reference/trimDemographics.html
@@ -120,16 +120,16 @@