a02_require_specific_cohort_entry
+ + + +a02_require_specific_cohort_entry.Rmd
From 8453a1c0a40b7eb691899d8ba4ea41529bf91549 Mon Sep 17 00:00:00 2001 From: edward-burn <9583964+edward-burn@users.noreply.github.com> Date: Wed, 28 Aug 2024 11:59:01 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20OHDSI/Co?= =?UTF-8?q?hortConstructor@b315e7924593c51170d32db9baa97ed895070ebd=20?= =?UTF-8?q?=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- 404.html | 13 +- LICENSE.html | 13 +- articles/a00_introduction.html | 13 +- articles/a01_building_base_cohorts.html | 17 +- .../a02_require_specific_cohort_entry.html | 99 +++++++ articles/a03_require_in_date_range.html | 99 +++++++ articles/a04_require_demographics.html | 99 +++++++ articles/a04_require_intersections.html | 243 ++++++++++++++++++ articles/a05_update_cohort_start_end.html | 99 +++++++ articles/a06_concatanate_cohorts.html | 99 +++++++ articles/a07_filter_cohorts.html | 99 +++++++ articles/a08_split_cohorts.html | 99 +++++++ articles/a09_combine_cohorts.html | 99 +++++++ articles/a10_match_cohorts.html | 228 ++++++++++++++++ articles/index.html | 33 ++- authors.html | 13 +- index.html | 13 +- pkgdown.yml | 15 +- reference/CohortConstructor-package.html | 13 +- reference/collapseCohorts.html | 13 +- reference/conceptCohort.html | 13 +- reference/demographicsCohort.html | 13 +- reference/entryAtFirstDate.html | 21 +- reference/entryAtLastDate.html | 19 +- reference/exitAtDeath.html | 33 ++- reference/exitAtFirstDate.html | 21 +- reference/exitAtLastDate.html | 21 +- reference/exitAtObservationEnd.html | 23 +- reference/index.html | 13 +- reference/intersectCohorts.html | 13 +- reference/matchCohorts.html | 31 ++- reference/measurementCohort.html | 13 +- reference/mockCohortConstructor.html | 13 +- reference/reexports.html | 13 +- reference/requireAge.html | 13 +- reference/requireCohortIntersect.html | 13 +- reference/requireConceptIntersect.html | 13 +- reference/requireDeathFlag.html | 13 +- reference/requireDemographics.html | 13 +- reference/requireFutureObservation.html | 13 +- reference/requireInDateRange.html | 13 +- reference/requireIsEntry.html | 13 +- reference/requireIsFirstEntry.html | 13 +- reference/requireIsLastEntry.html | 13 +- reference/requirePriorObservation.html | 13 +- reference/requireSex.html | 13 +- reference/requireTableIntersect.html | 13 +- reference/sampleCohorts.html | 33 ++- reference/stratifyCohorts.html | 13 +- reference/subsetCohorts.html | 13 +- reference/trimDemographics.html | 33 ++- reference/trimToDateRange.html | 13 +- reference/unionCohorts.html | 13 +- reference/yearCohorts.html | 13 +- search.json | 2 +- sitemap.xml | 13 +- 56 files changed, 1793 insertions(+), 201 deletions(-) create mode 100644 articles/a02_require_specific_cohort_entry.html create mode 100644 articles/a03_require_in_date_range.html create mode 100644 articles/a04_require_demographics.html create mode 100644 articles/a04_require_intersections.html create mode 100644 articles/a05_update_cohort_start_end.html create mode 100644 articles/a06_concatanate_cohorts.html create mode 100644 articles/a07_filter_cohorts.html create mode 100644 articles/a08_split_cohorts.html create mode 100644 articles/a09_combine_cohorts.html create mode 100644 articles/a10_match_cohorts.html diff --git a/404.html b/404.html index 837d8289..95d660bd 100644 --- a/404.html +++ b/404.html @@ -36,9 +36,16 @@
a02_require_specific_cohort_entry.Rmd
a03_require_in_date_range.Rmd
a04_require_demographics.Rmd
a04_require_intersections.Rmd
For this example we’ll use the Eunomia synthetic data from the +CDMConnector package.
+
+library(CDMConnector)
+library(CohortConstructor)
+con <- DBI::dbConnect(duckdb::duckdb(), dbdir = eunomia_dir())
+cdm <- cdm_from_con(con, cdm_schema = "main",
+ write_schema = c(prefix = "my_study_", schema = "main"))
Let’s start by creating two drug cohorts, one for users of diclofenac +and another for users of acetaminophen.
+
+cdm$medications <- conceptCohort(cdm = cdm,
+ conceptSet = list("diclofenac" = 1124300,
+ "acetaminophen" = 1127433),
+ name = "medications")
+cohortCount(cdm$medications)
+#> # A tibble: 2 × 3
+#> cohort_definition_id number_records number_subjects
+#> <int> <int> <int>
+#> 1 1 9365 2580
+#> 2 2 830 830
As well as our medication cohorts, let’s also make another cohort +containing individuals with a record of a GI bleed. Later we’ll use this +cohort when specifying inclusion/ exclusion criteria.
+
+cdm$gi_bleed <- conceptCohort(cdm = cdm,
+ conceptSet = list("gi_bleed" = 192671),
+ name = "gi_bleed")
Individuals can contribute multiple records per cohort. However now
+we’ll keep only their earliest cohort entry of the remaining records
+using requireIsFirstEntry()
from CohortConstructor. We can
+see that after this we have one record per person for each cohort.
+cdm$medications <- cdm$medications %>%
+ requireIsFirstEntry()
+
+cohortCount(cdm$medications)
+#> # A tibble: 2 × 3
+#> cohort_definition_id number_records number_subjects
+#> <int> <int> <int>
+#> 1 1 2580 2580
+#> 2 2 830 830
Note, applying this criteria later after applying other criteria +would result in a different result. Here we’re requiring that +individuals meet inclusion criteria at the time of their first use of +diclofenac or acetaminophen.
+Using requireDemographics()
we’ll require that
+individuals in our medications cohort are female and, relative to their
+cohort start date, are between 18 and 85 with at least 30 days of prior
+observation time in the database.
+cdm$medications <- cdm$medications %>%
+ requireDemographics(indexDate = "cohort_start_date",
+ ageRange = list(c(18, 85)),
+ sex = "Female",
+ minPriorObservation = 30)
We can then see how many people have people have been excluded based +on these demographic requirements.
+
+cohort_attrition(cdm$medications) %>%
+ dplyr::filter(reason == "Demographic requirements") %>%
+ dplyr::glimpse()
+#> Rows: 0
+#> Columns: 7
+#> $ cohort_definition_id <int>
+#> $ number_records <int>
+#> $ number_subjects <int>
+#> $ reason_id <int>
+#> $ reason <chr>
+#> $ excluded_records <int>
+#> $ excluded_subjects <int>
Next we can use requireInDateRange()
to keep only those
+records where cohort entry was between a particular date range.
+cdm$medications <- cdm$medications %>%
+ requireInDateRange(indexDate = "cohort_start_date",
+ dateRange = as.Date(c("2000-01-01", "2015-01-01")))
Again, we can track cohort attrition
+
+cohort_attrition(cdm$medications) %>%
+ dplyr::filter(reason == "cohort_start_date between 2000-01-01 and 2015-01-01") %>%
+ dplyr::glimpse()
+#> Rows: 0
+#> Columns: 7
+#> $ cohort_definition_id <int>
+#> $ number_records <int>
+#> $ number_subjects <int>
+#> $ reason_id <int>
+#> $ reason <chr>
+#> $ excluded_records <int>
+#> $ excluded_subjects <int>
We could require that individuals in our medication cohorts have a
+history of GI bleed. To do this we can use the
+requireCohortIntersect()
function, requiring that
+individuals have one or more intersections with the GI bleed cohort.
+cdm$medications_gi_bleed <- cdm$medications %>%
+ requireCohortIntersect(intersections = c(1,Inf),
+ targetCohortTable = "gi_bleed",
+ targetCohortId = 1,
+ indexDate = "cohort_start_date",
+ window = c(-Inf, 0),
+ name = "medications_gi_bleed")
+cohort_count(cdm$medications_gi_bleed)
+#> # A tibble: 2 × 3
+#> cohort_definition_id number_records number_subjects
+#> <int> <int> <int>
+#> 1 1 0 0
+#> 2 2 0 0
Instead of requiring that individuals have history of GI bleed, we
+could instead require that they are don’t have any history of it. In
+this case we can again use the requireCohortIntersect()
+function, but this time set the intersections argument to 0 to require
+individuals’ absence in this other cohort rather than their presence in
+it.
+cdm$medications_no_gi_bleed <- cdm$medications %>%
+ requireCohortIntersect(intersections = 0,
+ targetCohortTable = "gi_bleed",
+ targetCohortId = 1,
+ indexDate = "cohort_start_date",
+ window = c(-Inf, 0),
+ name = "medications_no_gi_bleed")
+cohort_count(cdm$medications_no_gi_bleed)
+#> # A tibble: 2 × 3
+#> cohort_definition_id number_records number_subjects
+#> <int> <int> <int>
+#> 1 1 101 101
+#> 2 2 179 179
a05_update_cohort_start_end.Rmd
a06_concatanate_cohorts.Rmd
a07_filter_cohorts.Rmd
a08_split_cohorts.Rmd
a09_combine_cohorts.Rmd
a10_match_cohorts.Rmd
CohortConstructor packages includes a function to obtain an age and
+sex matched cohort, the generateMatchedCohortSet()
+function. In this vignette, we will explore the usage of this
+function.
We will first use mockDrugUtilisation()
function from
+DrugUtilisation package to create mock data.
+library(CohortConstructor)
+library(dplyr)
+
+cdm <- mockCohortConstructor(nPerson = 1000)
As we will use cohort1
to explore
+generateMatchedCohortSet()
, let us first use
+cohort_attrition()
from CDMConnector package to explore
+this cohort:
+CDMConnector::cohort_set(cdm$cohort1)
Let us first see an example of how this function works. For its
+usage, we need to provide a cdm
object, the
+targetCohortName
, which is the name of the table containing
+the cohort of interest, and the name
of the new generated
+tibble containing the cohort and the matched cohort. We will also use
+the argument targetCohortId
to specify that we only want a
+matched cohort for cohort_definition_id = 1
.
+cdm$matched_cohort1 <- matchCohorts(
+ cohort = cdm$cohort1,
+ cohortId = 1,
+ name = "matched_cohort1")
+
+CDMConnector::cohort_set(cdm$matched_cohort1)
Notice that in the generated tibble, there are two cohorts:
+cohort_definition_id = 1
(original cohort), and
+cohort_definition_id = 4
(matched cohort).
+target_cohort_name column indicates which is the original
+cohort. match_sex and match_year_of_birth adopt
+boolean values (TRUE
/FALSE
) indicating if we
+have matched for sex and age, or not. match_status indicate if
+it is the original cohort (target
) or if it is the matched
+cohort (matched
). target_cohort_id indicates which
+is the cohort_id of the original cohort.
Check the exclusion criteria applied to generate the new cohorts by
+using cohort_attrition()
from CDMConnector package:
+# Original cohort
+CDMConnector::cohort_attrition(cdm$matched_cohort1) %>% filter(cohort_definition_id == 1)
+
+# Matched cohort
+CDMConnector::cohort_attrition(cdm$matched_cohort1) %>% filter(cohort_definition_id == 4)
Briefly, from the original cohort, we exclude first those individuals +that do not have a match, and then individuals that their matching pair +is not in observation during the assigned cohort_start_date. +From the matched cohort, we start from the whole database and we first +exclude individuals that are in the original cohort. Afterwards, we +exclude individuals that do not have a match, then individuals that are +not in observation during the assigned cohort_start_date, and +finally we remove as many individuals as required to fulfill the +ratio.
+Notice that matching pairs are randomly assigned, so it is probable
+that every time you execute this function, the generated cohorts change.
+Use set.seed()
to avoid this.
matchSex
is a boolean parameter
+(TRUE
/FALSE
) indicating if we want to match by
+sex (TRUE
) or we do not want to (FALSE
).
matchYear
is another boolean parameter
+(TRUE
/FALSE
) indicating if we want to match by
+age (TRUE
) or we do not want (FALSE
).
Notice that if matchSex = FALSE
and
+matchYear = FALSE
, we will obtain an unmatched comparator
+cohort.
The default matching ratio is 1:1 (ratio = 1
). Use
+cohort_counts()
from CDMConnector to check if the matching
+has been done as desired.
+CDMConnector::cohort_count(cdm$matched_cohort1)
You can modify the ratio
parameter to tailor your
+matched cohort. ratio
can adopt values from 1 to Inf.
+cdm$matched_cohort2 <- matchCohorts(
+ cohort = cdm$cohort1,
+ cohortId = 1,
+ name = "matched_cohort2",
+ ratio = Inf)
+
+CDMConnector::cohort_count(cdm$matched_cohort2)
All these functionalities can be implemented across multiple cohorts
+simultaneously. Specify in targetCohortId
parameter which
+are the cohorts of interest. If set to NULL, all the cohorts present in
+targetCohortName
will be matched.
+cdm$matched_cohort3 <- matchCohorts(
+ cohort = cdm$cohort1,
+ cohortId = c(1,3),
+ name = "matched_cohort3",
+ ratio = 2)
+
+CDMConnector::cohort_set(cdm$matched_cohort3) %>% arrange(cohort_definition_id)
+
+CDMConnector::cohort_count(cdm$matched_cohort3) %>% arrange(cohort_definition_id)
Notice that each cohort has their own (and independent of other +cohorts) matched cohort.
+