-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added implementation of 'combination cohorts'. #193
base: main
Are you sure you want to change the base?
Conversation
Added simple test.
cohortDefinitionSet$checksum <- "" | ||
for (i in 1:nrow(cohortDefinitionSet)) { | ||
if (isTRUE(attr(cohortDefinitionSet, "hasSubsetDefinitions"))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was moved around to better work with > 2 cohort types. When it was just the two, it was simpler to say 'either or' in this loop, but arranged this way we can apply different styles of generated cohorts to CohortGenerator.
} else if (isTRUE(attr(cohortDefinitionSet, "hasCombinedCohorts"))) { | ||
dependantCohortIds <- as.integer(strsplit(cohortDefinitionSet$dependentCohorts[i])) | ||
dependentCohortIdx <- which(cohortDefinitionSet$cohortId %in% dependantCohortIds) | ||
cohortDefinitionSet$checksum[i] <- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop logic is now: if it is a subset cohort, calc checksum one way, if it is combined cohort, do it another way, else do it the simple 'by sql' way.
if (isSubset) { | ||
sql <- SqlRender::render( | ||
sql = sql, | ||
cdm_database_schema = cdmDatabaseSchema, | ||
cohort_table = cohortTableNames$cohortTable, | ||
cohort_database_schema = cohortDatabaseSchema, | ||
warnOnMissingParameters = FALSE | ||
) | ||
} else { # combined cohorts apply same paramaters as standard cohort generation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same sort of re-organization here: before, the assumption was if it's not a subset, then it must be a standard cohort generation (with sql). But now that there's another choice, it is better to condition on positive identification if (isSubset)
vs. if(!isSubset)
.
@@ -0,0 +1,69 @@ | |||
.loadJson <- function(definition, simplifyVector = FALSE, simplifyDataFrame = FALSE, ...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions are carry-overs from CohortIncidence. I did a lot of testing in CI for handling serialization and I wanted to carry those learnings forward to here.
@@ -14,26 +14,6 @@ | |||
# See the License for the specific language governing permissions and | |||
# limitations under the License. | |||
|
|||
.loadJson <- function(definition, simplifyVector = FALSE, simplifyDataFrame = FALSE, ...) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was moved to SerializeUtils.R.
@@ -0,0 +1,66 @@ | |||
test_that("combination cohort generation", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a single, simple test, but it does perform some complicated operations such as loading the cohort table via generateCohort
but the SQL passed in just inserts verbatim cohort records. This is so that we can pre-set specific overlapping date ranges so that we can confirm that the multiple cohorts got collapsed down into the expected result.
This PR introduces basic functionality for combining cohorts into new ones.
Only a simple test was implemented, but it's fairly complicated and can serve as foundation for some other cases.
Will leave comments/reasons for certain changes in the 'files changed' section.