Added implementation of 'combination cohorts'. #193

chrisknoll · 2024-09-23T04:17:32Z

This PR introduces basic functionality for combining cohorts into new ones.

Only a simple test was implemented, but it's fairly complicated and can serve as foundation for some other cases.

Will leave comments/reasons for certain changes in the 'files changed' section.

Added simple test.

chrisknoll · 2024-09-23T04:19:35Z

R/CohortConstruction.R

+    cohortDefinitionSet$checksum <- ""
+    for (i in 1:nrow(cohortDefinitionSet)) {
+      if (isTRUE(attr(cohortDefinitionSet, "hasSubsetDefinitions"))) {


This was moved around to better work with > 2 cohort types. When it was just the two, it was simpler to say 'either or' in this loop, but arranged this way we can apply different styles of generated cohorts to CohortGenerator.

chrisknoll · 2024-09-23T04:20:37Z

R/CohortConstruction.R

+      } else if (isTRUE(attr(cohortDefinitionSet, "hasCombinedCohorts"))) {
+        dependantCohortIds <- as.integer(strsplit(cohortDefinitionSet$dependentCohorts[i]))
+        dependentCohortIdx <- which(cohortDefinitionSet$cohortId %in% dependantCohortIds)
+        cohortDefinitionSet$checksum[i] <- 


The loop logic is now: if it is a subset cohort, calc checksum one way, if it is combined cohort, do it another way, else do it the simple 'by sql' way.

chrisknoll · 2024-09-23T04:21:58Z

R/CohortConstruction.R

+    if (isSubset) {
+      sql <- SqlRender::render(
+        sql = sql,
+        cdm_database_schema = cdmDatabaseSchema,
+        cohort_table = cohortTableNames$cohortTable,
+        cohort_database_schema = cohortDatabaseSchema,
+        warnOnMissingParameters = FALSE
+      )
+    } else {  # combined cohorts apply same paramaters as standard cohort generation


Same sort of re-organization here: before, the assumption was if it's not a subset, then it must be a standard cohort generation (with sql). But now that there's another choice, it is better to condition on positive identification if (isSubset) vs. if(!isSubset).

chrisknoll · 2024-09-23T04:22:49Z

R/SerializeUtils.R

@@ -0,0 +1,69 @@
+.loadJson <- function(definition, simplifyVector = FALSE, simplifyDataFrame = FALSE, ...) {


These functions are carry-overs from CohortIncidence. I did a lot of testing in CI for handling serialization and I wanted to carry those learnings forward to here.

chrisknoll · 2024-09-23T04:23:11Z

R/Subsets.R

@@ -14,26 +14,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-.loadJson <- function(definition, simplifyVector = FALSE, simplifyDataFrame = FALSE, ...) {


This was moved to SerializeUtils.R.

chrisknoll · 2024-09-23T04:24:40Z

tests/testthat/test-CombinationCohorts.R

@@ -0,0 +1,66 @@
+test_that("combination cohort generation", {


This is a single, simple test, but it does perform some complicated operations such as loading the cohort table via generateCohort but the SQL passed in just inserts verbatim cohort records. This is so that we can pre-set specific overlapping date ranges so that we can confirm that the multiple cohorts got collapsed down into the expected result.

Added implementation of 'combination cohorts'.

0c73341

Added simple test.

chrisknoll marked this pull request as draft September 23, 2024 04:17

chrisknoll requested review from azimov and anthonysena September 23, 2024 04:17

chrisknoll commented Sep 23, 2024

View reviewed changes

Address git action warnings/errors.

d0734ee

anthonysena linked an issue Sep 23, 2024 that may be closed by this pull request

Cohort Algebra operators - union, minus, intersect #95

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added implementation of 'combination cohorts'. #193

Added implementation of 'combination cohorts'. #193

chrisknoll commented Sep 23, 2024

chrisknoll Sep 23, 2024

chrisknoll Sep 23, 2024

chrisknoll Sep 23, 2024

chrisknoll Sep 23, 2024

chrisknoll Sep 23, 2024

chrisknoll Sep 23, 2024

		@@ -0,0 +1,69 @@
		.loadJson <- function(definition, simplifyVector = FALSE, simplifyDataFrame = FALSE, ...) {

		@@ -0,0 +1,66 @@
		test_that("combination cohort generation", {

Added implementation of 'combination cohorts'. #193

Are you sure you want to change the base?

Added implementation of 'combination cohorts'. #193

Conversation

chrisknoll commented Sep 23, 2024

chrisknoll Sep 23, 2024

Choose a reason for hiding this comment

chrisknoll Sep 23, 2024

Choose a reason for hiding this comment

chrisknoll Sep 23, 2024

Choose a reason for hiding this comment

chrisknoll Sep 23, 2024

Choose a reason for hiding this comment

chrisknoll Sep 23, 2024

Choose a reason for hiding this comment

chrisknoll Sep 23, 2024

Choose a reason for hiding this comment