Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Execution Settings R6 Class #196

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

mdlavallee92
Copy link

Proposal: add an r6 class for an execution settings that handles the routing of the connectionDetails and the databaseSchemas. Class would also have a method to connect, disconnect and identify the dbms.

},

# disconnect to database
disconnect = function() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps a "withConnection" withr style context can be implemented as a method see here:

OHDSI/DatabaseConnector#269

},

#TODO make this more rigorous
# add warning if no connection available
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include a DatabaseConnector::dbIsValid call?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. I left this open as I need to think of all checks on the database connection.

connection = NULL,
cdmDatabaseSchema = NULL,
workDatabaseSchema = NULL,
tempEmulationSchema = NULL,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tempEmulationSchema is normally an option

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have it as an argument? snowflake requires it for everything selfishly hope to keep it here 😃

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, keep it as an argument but the current convention across other packages is to use

tempEmulationSchema = getOption('sqlRenderTempEmulationSchema')

So I'm proposing that this behvaiour is consistent.

#' An R6 class to define an ExecutionSettings object
#'
#' @export
ExecutionSettings <- R6::R6Class(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anthonysena I think this is a useful abstraction that can be used in almost all our packages and CohortGenerator seems an appropriate place to put it.

Some initial points that I think we need:

  • Changing the generate cohorts api so that it creates this object OR uses it if specified - this way we can maintain compatibility for existing scripts. Alternatively we should provide dual APIs (old and new) by creating a new function executeCohorts.

  • We will need to write a vignette

  • we should update all functions in this package to use this

  • We should look at the stratuegus execution context list to see if there is additional stuff we should replicate there

  • Perhaps we also move or replicate the cdm source meta data function from strategus here?

  • We should add a helper function to create an instance

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed @azimov and thanks for this contribution @mdlavallee92!

Changing the generate cohorts api so that it creates this object OR uses it if specified - this way we can maintain compatibility for existing scripts. Alternatively we should provide dual APIs (old and new) by creating a new function executeCohorts.

Let's aim to add this as an additional way of encapsulating the settings as you suggested and then fully adopt it when we have the opportunity to break compatibility with v1.x.

@mdlavallee92
Copy link
Author

@azimov, @anthonysena and @chrisknoll this is the file I use for the R6Class functions. I added to the PR for execution settings for now but it would be better if this lived in some "tools" package we maintain elsewhere.

tagging @alondhe and @katy-sadowski for awareness

@chrisknoll
Copy link

chrisknoll commented Oct 27, 2024

@azimov, @anthonysena and @chrisknoll this is the file I use for the R6Class functions. I added to the PR for execution settings for now but it would be better if this lived in some "tools" package we maintain elsewhere.

These would make good utility functions. I'm not sure I'd use them in my case, as there's cases where multiple checks are made (see here or here). But if it makes sense to use them, all for it.

ExecutionSettings <- R6::R6Class(
classname = "ExecutionSettings",
public = list(
initialize = function(connectionDetails = NULL,
Copy link

@chrisknoll chrisknoll Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, the from of initialize here is different than what we discussed about R6 classes in the monthly meeting:

  1. Initialize takes list-of-list (ie: fromJSON()) or a string which is JSON format and the object is initialized from that.
  2. The objective of those R6 classes (in CohortIncidence) was to just enshroud object state into a formalized construct. So, it wasn't meant to have behavior about connecting, and the entire object model should be able to be serialized from/to json and I'm not sure connection details is that.

So based on the above I was imagining the execution settings to contain user-input data about schemas, runtime options, etc, and then to execute a study you have the analysis, execution settings and connection details provided separately (especially because sometimes you pass along an actual open connection (for whatever reason) or a connection details and the code decides which to use).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the context of this class is useful for a different reason that serialization or de-serialization. This would allow us to standardize across packages how we encapsulate inputs. At the moment these are fairly inconsistent.

I would say we explicitly never want to serialize connectionDetails as this adds the potential to leak security details. So the construction should be:

createExecutionSettings <- function(connection = NULL, connectionDetails = NULL, cdmDatabaseSchema, cohortDatabaseSchema = ... etc) {
   data <- list(
      <insertAllCommonSettings>
   )
   ExecutionSettings$new(connection = connection, connectionDetails = connectionDetails, data = data)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the context of this class is useful for a different reason that serialization or de-serialization.

That's a fair point, but this PR seems to set the basis for one example of implementing the other R6 classes. I was looking at the R6 classes as a means to define the analytic input API (at least at a data structure level, not a functional level). So there will be some that are pure data oriented (to define structure and handle serialization) and maybe there will be some that have behavior (for example, CI has a query builder R6 class I believe).

Maybe we need to see an example of both in this PR so we see where lines are drawn between behavior classes and structure classes.

@azimov
Copy link
Collaborator

azimov commented Oct 29, 2024

@azimov, @anthonysena and @chrisknoll this is the file I use for the R6Class functions. I added to the PR for execution settings for now but it would be better if this lived in some "tools" package we maintain elsewhere.

These would make good utility functions. I'm not sure I'd use them in my case, as there's cases where multiple checks are made (see here or here). But if it makes sense to use them, all for it.

I think we should seek to expand this list, there will always be exceptions but there are basic things that require a double check. For example, if a value is a big int this is not readily supported by checkmate. The best solution I could think of was to check if the value is numeric and value modulo 1 == 0. Agreeing on this and standardizing would be incredibly useful.

It could also be possible to use ... notation when passed only to a single checkmate validator, with some reasonable defaults to override behaviour. E.g:

.setCharacter <- function(private, key, value, min.chars = 1, null.ok = FALSE, ...) {
  checkmate::assert_character(x = value, min.chars =min.chars, null.ok = null.ok, ...)
  private[[key]] <- value
  invisible(private)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants