Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Execution Settings R6 Class #196

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
198 changes: 198 additions & 0 deletions R/executionSettings.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@

# ExecutionSettings ----

#' @description
#' An R6 class to define an ExecutionSettings object
#'
#' @export
ExecutionSettings <- R6::R6Class(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anthonysena I think this is a useful abstraction that can be used in almost all our packages and CohortGenerator seems an appropriate place to put it.

Some initial points that I think we need:

  • Changing the generate cohorts api so that it creates this object OR uses it if specified - this way we can maintain compatibility for existing scripts. Alternatively we should provide dual APIs (old and new) by creating a new function executeCohorts.

  • We will need to write a vignette

  • we should update all functions in this package to use this

  • We should look at the stratuegus execution context list to see if there is additional stuff we should replicate there

  • Perhaps we also move or replicate the cdm source meta data function from strategus here?

  • We should add a helper function to create an instance

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed @azimov and thanks for this contribution @mdlavallee92!

Changing the generate cohorts api so that it creates this object OR uses it if specified - this way we can maintain compatibility for existing scripts. Alternatively we should provide dual APIs (old and new) by creating a new function executeCohorts.

Let's aim to add this as an additional way of encapsulating the settings as you suggested and then fully adopt it when we have the opportunity to break compatibility with v1.x.

classname = "ExecutionSettings",
public = list(
initialize = function(connectionDetails = NULL,
Copy link

@chrisknoll chrisknoll Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, the from of initialize here is different than what we discussed about R6 classes in the monthly meeting:

  1. Initialize takes list-of-list (ie: fromJSON()) or a string which is JSON format and the object is initialized from that.
  2. The objective of those R6 classes (in CohortIncidence) was to just enshroud object state into a formalized construct. So, it wasn't meant to have behavior about connecting, and the entire object model should be able to be serialized from/to json and I'm not sure connection details is that.

So based on the above I was imagining the execution settings to contain user-input data about schemas, runtime options, etc, and then to execute a study you have the analysis, execution settings and connection details provided separately (especially because sometimes you pass along an actual open connection (for whatever reason) or a connection details and the code decides which to use).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the context of this class is useful for a different reason that serialization or de-serialization. This would allow us to standardize across packages how we encapsulate inputs. At the moment these are fairly inconsistent.

I would say we explicitly never want to serialize connectionDetails as this adds the potential to leak security details. So the construction should be:

createExecutionSettings <- function(connection = NULL, connectionDetails = NULL, cdmDatabaseSchema, cohortDatabaseSchema = ... etc) {
   data <- list(
      <insertAllCommonSettings>
   )
   ExecutionSettings$new(connection = connection, connectionDetails = connectionDetails, data = data)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the context of this class is useful for a different reason that serialization or de-serialization.

That's a fair point, but this PR seems to set the basis for one example of implementing the other R6 classes. I was looking at the R6 classes as a means to define the analytic input API (at least at a data structure level, not a functional level). So there will be some that are pure data oriented (to define structure and handle serialization) and maybe there will be some that have behavior (for example, CI has a query builder R6 class I believe).

Maybe we need to see an example of both in this PR so we see where lines are drawn between behavior classes and structure classes.

connection = NULL,
cdmDatabaseSchema = NULL,
workDatabaseSchema = NULL,
tempEmulationSchema = NULL,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tempEmulationSchema is normally an option

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have it as an argument? snowflake requires it for everything selfishly hope to keep it here 😃

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, keep it as an argument but the current convention across other packages is to use

tempEmulationSchema = getOption('sqlRenderTempEmulationSchema')

So I'm proposing that this behvaiour is consistent.

targetCohortTable = NULL,
cdmSourceName = NULL) {
stopifnot(is.null(connectionDetails) || is.null(connection))
.setClass(private = private, key = "connectionDetails", value = connectionDetails, class = "ConnectionDetails")
.setClass(private = private, key = ".connection", value = connection,
class = "DatabaseConnectorJdbcConnection", nullable = TRUE)
.setString(private = private, key = ".cdmDatabaseSchema", value = cdmDatabaseSchema)
.setString(private = private, key = ".workDatabaseSchema", value = workDatabaseSchema)
.setString(private = private, key = ".tempEmulationSchema", value = tempEmulationSchema)
.setString(private = private, key = ".targetCohortTable", value = targetCohortTable)
.setString(private = private, key = ".cdmSourceName", value = cdmSourceName)
},

getDbms = function() {
dbms <- private$connectionDetails$dbms
return(dbms)
},

# connect to database
connect = function() {

# check if private$connection is NULL
conObj <- private$.connection
if (is.null(conObj)) {
private$.connection <- DatabaseConnector::connect(private$connectionDetails)
} else{
cli::cat_bullet(
"Connection object already open",
bullet = "info",
bullet_col = "blue"
)
}
},

# disconnect to database
disconnect = function() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps a "withConnection" withr style context can be implemented as a method see here:

OHDSI/DatabaseConnector#269


# check if private$connection is NULL
conObj <- private$.connection
if (class(conObj) == "DatabaseConnectorJdbcConnection") {
# disconnect connection
DatabaseConnector::disconnect(private$.connection)
private$.connection <- NULL
}

cli::cat_bullet(
"Connection object has been disconected",
bullet = "info",
bullet_col = "blue"
)
invisible(conObj)
},

#TODO make this more rigorous
# add warning if no connection available
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include a DatabaseConnector::dbIsValid call?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. I left this open as I need to think of all checks on the database connection.

getConnection = function() {
conObj <- private$.connection
return(conObj)
}

),

private = list(
connectionDetails = NULL,
.connection = NULL,
.cdmDatabaseSchema = NULL,
.workDatabaseSchema = NULL,
.tempEmulationSchema = NULL,
.targetCohortTable = NULL,
.cdmSourceName = NULL
),

active = list(

cdmDatabaseSchema = function(value) {
# return the value if nothing added
if(missing(value)) {
cds <- private$.cdmDatabaseSchema
return(cds)
}
# replace the cdmDatabaseSchema
.setString(private = private, key = ".cdmDatabaseSchema", value = value)
cli::cat_bullet(
glue::glue("Replaced {crayon::cyan('cdmDatabaseSchema')} with {crayon::green(value)}"),
bullet = "info",
bullet_col = "blue"
)
},

workDatabaseSchema = function(value) {
# return the value if nothing added
if(missing(value)) {
cds <- private$.workDatabaseSchema
return(cds)
}
# replace the workDatabaseSchema
.setString(private = private, key = ".workDatabaseSchema", value = value)
cli::cat_bullet(
glue::glue("Replaced {crayon::cyan('workDatabaseSchema')} with {crayon::green(value)}"),
bullet = "info",
bullet_col = "blue"
)
},


tempEmulationSchema = function(value) {
# return the value if nothing added
if(missing(value)) {
tes <- private$.tempEmulationSchema
return(tes)
}
# replace the tempEmulationSchema
.setString(private = private, key = ".tempEmulationSchema", value = value)
cli::cat_bullet(
glue::glue("Replaced {crayon::cyan('tempEmulationSchema')} with {crayon::green(value)}"),
bullet = "info",
bullet_col = "blue"
)
},

targetCohortTable = function(value) {
# return the value if nothing added
if(missing(value)) {
tct <- private$.targetCohortTable
return(tct)
}
# replace the targetCohortTable
.setString(private = private, key = ".targetCohortTable", value = value)
cli::cat_bullet(
glue::glue("Replaced {crayon::cyan('targetCohortTable')} with {crayon::green(value)}"),
bullet = "info",
bullet_col = "blue"
)
},

cdmSourceName = function(value) {
# return the value if nothing added
if(missing(value)) {
csn <- private$.cdmSourceName
return(csn)
}
# replace the cdmSourceName
.setString(private = private, key = ".cdmSourceName", value = value)
cli::cat_bullet(
glue::glue("Replaced {crayon::cyan('cdmSourceName')} with {crayon::green(value)}"),
bullet = "info",
bullet_col = "blue"
)
}

)
)


#' @title
#' Create an ExecutionSettings object and set its attributes
#'
#' @param connectionDetails A DatabaseConnector connectionDetails object (optional if connection is specified)
#' @param connection A DatabaseConnector connection object (optional if connectionDetails is specified)
#' @param cdmDatabaseSchema The schema of the OMOP CDM database
#' @param workDatabaseSchema The schema to which results will be written
#' @param tempEmulationSchema Some database platforms like Oracle and Snowflake do not truly support temp tables. To emulate temp tables, provide a schema with write privileges where temp tables can be created.
#' @param targetCohortTable The name of the table where the target cohort(s) are stored
#' @param cdmSourceName A human-readable name for the OMOP CDM source
#'
#' @return An ExecutionSettings object
#' @export
createExecutionSettings <- function(connectionDetails,
connection = NULL,
cdmDatabaseSchema,
workDatabaseSchema,
tempEmulationSchema,
targetCohortTable,
cdmSourceName) {
executionSettings <- ExecutionSettings$new(connectionDetails = connectionDetails,
connection = connection,
cdmDatabaseSchema = cdmDatabaseSchema,
workDatabaseSchema = workDatabaseSchema,
tempEmulationSchema = tempEmulationSchema,
targetCohortTable = targetCohortTable,
cdmSourceName = cdmSourceName)
return(executionSettings)
}
Loading