Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to resume execution of DQD checks #411

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

ganisimov
Copy link

@ganisimov ganisimov commented Dec 19, 2022

It happens that due to connection issues or other conditions executeDqChecks fails and progress is lost.

When new resume argument is set to TRUE, file in outputFile path is used as source of check results instead of actual processing. Missing check results or results reporting error are re-processed.

Note that outputFile must be set explicitely to make resuming to take effect. New results overwrite file in outputFile path.

Addresses #109

It happens that due to connection issues or other conditions executeDqChecks fails and progress is lost.

When new resume argument is set to TRUE, file in outputFile path is used as source of check results instead of actual processing. Missing check results or results reporting error are re-processed.

Note that outputFile must be set explicitely to make resuming to take effect. New results overwrite file in outputFile path.

Addresses OHDSI#109
Copy link
Collaborator

@MaximMoinat MaximMoinat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great addition, thanks. See my comment in-line.

Two general remarks:

  • To be consistent with other OHDSI packages, I propose to either call this incremental mode like in CohortDiagnostics or add a new function runMissingChecks like in Achilles. @katy-sadowski Maybe something to discuss in an upcoming WG meet?
  • Does this also work if a fatal error occurs? (e.g. connection lost, or R session timing out). I think the json results object only gets written at the end, so we might need to write out intermediate results to make this work for all cases where a DQD execution might be interrupted.

conceptId = check["conceptId"],
unitConceptId = check["unitConceptId"]
)
checkResultCandidates <- checkResultsSaved %>% dplyr::filter(checkId == currentCheckId & is.na(ERROR))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes the ERROR field is present in the results object, which is not the case when none of the checks resulted in an error.

@katy-sadowski
Copy link
Collaborator

I 100% want to add this feature, thanks @ganisimov for the contribution and @MaximMoinat for reviving the thread 😃

I've been thinking, however, that we might want to wait to add it in as part of the larger check-running workflow overhaul we've been discussing implementing in 2024. Part of the vision for that workflow includes moving away from the current "all-or-nothing" execution towards something more incremental that's aware of the relationships/dependencies between checks. I think that implementation will lend itself better to figuring out how/where to store incremental results, how to handle various failure modes, etc.

Would definitely love to discuss this in a WG meeting early next year!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants