Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We should provide time-out functionality #107

Open
msuchard opened this issue Jun 20, 2023 · 3 comments
Open

We should provide time-out functionality #107

msuchard opened this issue Jun 20, 2023 · 3 comments
Milestone

Comments

@msuchard
Copy link
Member

Really helpful in some data sources would be a time-out feature in which the package kills a query (and then importantly continues onto the next task) that executes for longer than a specified time.

Might this be easily feasible @azimov @anthonysena ?

@azimov
Copy link
Collaborator

azimov commented Jun 20, 2023

I think this should be possible to implement with something like R.utils::withTimeout though we would need to test how well this plays with DatabaseConnector and its calls to java.

I think there are some other questions about the default behaviour - I assume this is to have no timeout rather than assert that no cohort should take more than 10 hours to execute?

@anthonysena
Copy link
Collaborator

Think we'll need to think about the behavior but seems possible and reasonable. I suppose we'd want to apply a uniform timeout to call cohort generations in this case. I agree with @azimov that we'd want to test this out to make sure that the timeout plays nicely with database connections.

@anthonysena anthonysena added this to the v0.10 milestone May 29, 2024
@anthonysena anthonysena modified the milestones: v0.10, v1.0 Jun 20, 2024
@anthonysena
Copy link
Collaborator

I have an initial implementation of this feature on the time-out branch and relevant changes are here: aadf624.

My main concern with the current implementation is the database connection handling. The generateCohortSet function opens a single database connection which is passed into an internal function generateCohort and while the timeout does stop the cohort generation (on the R side) it doesn't forcibly close the database connection which would lead to connection leaking thus defeating the intent of this functionality.

The work on this branch could be further extended such that when a timeout > 0 is specified, we pass only the connectionDetails to the generateCohort function so that a single connection is used for this operation. Then when a timeout is encountered, we forcibly close the connection.

I was also having a bit of trouble with the unit tests for this functionality so stopped short of getting this into the v0.10 release. If there is a desire to move this functionality forward, please let me know and I can try to finish it up for another v0.x release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants