Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added implementation of 'combination cohorts'. #193

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

chrisknoll
Copy link

This PR introduces basic functionality for combining cohorts into new ones.

Only a simple test was implemented, but it's fairly complicated and can serve as foundation for some other cases.

Will leave comments/reasons for certain changes in the 'files changed' section.

@chrisknoll chrisknoll marked this pull request as draft September 23, 2024 04:17
Comment on lines +109 to +111
cohortDefinitionSet$checksum <- ""
for (i in 1:nrow(cohortDefinitionSet)) {
if (isTRUE(attr(cohortDefinitionSet, "hasSubsetDefinitions"))) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved around to better work with > 2 cohort types. When it was just the two, it was simpler to say 'either or' in this loop, but arranged this way we can apply different styles of generated cohorts to CohortGenerator.

Comment on lines +120 to +123
} else if (isTRUE(attr(cohortDefinitionSet, "hasCombinedCohorts"))) {
dependantCohortIds <- as.integer(strsplit(cohortDefinitionSet$dependentCohorts[i]))
dependentCohortIdx <- which(cohortDefinitionSet$cohortId %in% dependantCohortIds)
cohortDefinitionSet$checksum[i] <-
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop logic is now: if it is a subset cohort, calc checksum one way, if it is combined cohort, do it another way, else do it the simple 'by sql' way.

Comment on lines +260 to +268
if (isSubset) {
sql <- SqlRender::render(
sql = sql,
cdm_database_schema = cdmDatabaseSchema,
cohort_table = cohortTableNames$cohortTable,
cohort_database_schema = cohortDatabaseSchema,
warnOnMissingParameters = FALSE
)
} else { # combined cohorts apply same paramaters as standard cohort generation
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same sort of re-organization here: before, the assumption was if it's not a subset, then it must be a standard cohort generation (with sql). But now that there's another choice, it is better to condition on positive identification if (isSubset) vs. if(!isSubset).

@@ -0,0 +1,69 @@
.loadJson <- function(definition, simplifyVector = FALSE, simplifyDataFrame = FALSE, ...) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions are carry-overs from CohortIncidence. I did a lot of testing in CI for handling serialization and I wanted to carry those learnings forward to here.

@@ -14,26 +14,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.

.loadJson <- function(definition, simplifyVector = FALSE, simplifyDataFrame = FALSE, ...) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved to SerializeUtils.R.

@@ -0,0 +1,66 @@
test_that("combination cohort generation", {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a single, simple test, but it does perform some complicated operations such as loading the cohort table via generateCohort but the SQL passed in just inserts verbatim cohort records. This is so that we can pre-set specific overlapping date ranges so that we can confirm that the multiple cohorts got collapsed down into the expected result.

@anthonysena anthonysena linked an issue Sep 23, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cohort Algebra operators - union, minus, intersect
1 participant