-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Strategus module approach #145
Conversation
Excellent work @anthonysena ! This seems to cover pretty much everything we discussed. I'm almost sorry to see all that complicated code go ;-) Some notes:
|
Thanks for the review of this PR @schuemie! I'm working out changes for the logging (I had missed creating the results folder). Let me address your point about EvidenceSythessis: I think for ES, we can move the |
Fix ES upload bug
Posting this here per request of @anthonysena 👍
|
Thanks @chrisknoll for the review! Let me respond to your feedback, in a slightly different order:
Agreed - I've removed the logging piece of the StrategusModule per 87b33b4.
There are a few ideas in this feedback so let me take them one at a time. First, I retained the notion of a Changing the analysis specification is an issue for #144 and we can revisit these ideas there, including how modules work with the analysis specification. I think making it an R6 class makes sense. To your point about I'd also prefer to keep the connection details parameter so that we can consider both the analysis specifications and the execution settings when performing an incremental run as requested in #95. I'd like to avoid putting anything that requires sensitive information (like the connection details) in the execution settings so we can serialize those settings and use them to determine when a module needs to run based on the inputs changing.
I chose to use an inheritance structure here on purpose since I want to mandate certain behavior of a StrategusModule. Most of what is in the base class at this point is: defining how a module is named, how its settings class is defined, validating inputs for the function (which is where child classes can use super$ to access them) and to build the jobContext. Putting aside the jobContext since I expect that will change, I think the inheritance structure works well since I'd like to make sure that each child module has some standard behavior defined at the parent level. I think the child classes are still "filling in the blanks" to your point with the exception of the jobContext which we can hopefully refactor out as we move forward. If we need to refactor out utility functions to expose them at the package level, I'm open to that as well but for now I like the encapsulation of having private functions for the needs of the module. |
* Implementation of CohortIncidence module. Fixed param typo in ResultDataModel.R * Set runCheckAndFixCommands = TRUE to upload database_id properly. * Added target_outcome_xref to results model. * runCheckAndFixCommands = TRUE in DatabaseMetaData.R.
* Implementation of CohortIncidence module. (#147) * Adjustments from testing --------- Co-authored-by: Chris Knoll <[email protected]>
This draft pull request aims to address the following issues:
keyring
,renv
andtargets
package requirements #135 by removingkeyring
,renv
andtargets
package requirements.The current design is based on discussion in the Strategus HADES WG and I'll describe the changes in detail.
Overview
The major aim of v1.0 of Strategus is to in-source the modules that currently reside in their own code repositories (e.g. CohortGenerator module). To do this, we aim to provide a set of common interfaces that all modules will implement to perform their specific analytical use case. In addition, we'd like to remove the
keyring
,renv
andtargets
dependencies as they created too much complexity and headaches. We will still userenv
for dependency management; we will remove the use ofrenv::run()
from Strategus and may still retain this dependency if we want to ensure that a project has properly initialized renv, etc.This branch & PR assume that the structure of the analysis specification remains unchanged, mainly for my own sanity while working out these changes. More discussion on the structure of the analysis specification is happening in #144.
R/Module-StrategusModule.R
This file holds R6 Classes that aim to provide a common interface (base class in the case of R6) for all
StrategusModules
.execute(connectionDetails, analysisSpecifications, executionSettings)
: execute the modulecreateResultsDataModel(resultsConnectionDetails, resultsDatabaseSchema, tablePrefix = "")
: create the results data modelfor the module.uploadResults(resultsConnectionDetails, analysisSpecifications, resultsUploadSettings)
: upload the module resultscreateModuleSpecifications(moduleSpecifications)
: create the module's specifications (inputs)createSharedResourcesSpecifications(className, sharedResourcesSpecifications)
: create the shared resources specificationvalidateModuleSpecifications(moduleSpecifications)
: validate the module specificationsvalidateSharedResourcesSpecifications(className, sharedResourcesSpecifications)
: validate the shared resources specificationsI'll note that in the function signatures above,
connectionDetails
andresultsConnectionDetails
are purposely separated from the execution settings so that we can serialize the execution settings without the worry that it may include sensitive information. This replaces the behavior from Strategus v0.x in which we had a connection details reference that we used to grab the credentials fromkeyring
.The
StrategusModules
base class does have some base implementation that is worth pointing out here:StrategusModules
constructor.analysisSpecifications
andexecutionSettings
(cdm or results), this class builds aJobContext
that is a private member of the base class. This is mainly for backwards compatibility since the v0.x module code uses theJobContext
to access settings. I've aimed to make this an internal behavior of theStrategusModules
so that users can focus on constructing analysis & execution settings. This should also help the developers who need to test their modules.create*Specifications
andvalidate*Specifications
are basic and simply confirm that the appropriate class name is applied to the settings. Modules can do further validation as necessary but these base methods provide an initial implementation that should be inherited.StrategusModules
functions should provide type checking usingcheckmate
to ensure that parameters are ready for the child class implementation.I've included the following modules:
R/Module-Characterization
R/Module-CohortDiagnostics
R/Module-CohortGenerator
: this is working off of thedevelop
branch of CohortGenerator.R/Module-CohortIncidence
R/Module-CohortMethod
R/Module-EvidenceSynthesis
R/Module-PatientLevelPrediction
R/Module-SelfControlledCaseSeries
Test Harness
There is a sample script in
extras/R6ClassFun.R
that aims to provide a simple test harness that will: create an analysis specification (only for CG and CM for now) , execute the analysis via Strategus, create an empty SQLite database, create the results schema tables & upload results. There is also some code to launch a Shiny results viewer to inspect the results. This has been my primary way of testing these changes and is a good reference for the main changes in this branch. To point out a few specific items:Creating module settings: This is now done by creating an instance of the module class. For example:
This removes the need to source("/SettingsFunction.R") from the module's repo as was done in v0.x. Additionally some basic validation may be performed by the module (as was desired in #9 but what is in this branch is not even close to a solution to that state).
Development Notes
execute
method of Strategus creates a single log file for the overall analysis run but I'd like to have a log file per module's results folder as we've done in v0.x. UPDATE: This is resolved in the most recent version of the branch.moduleIndex
in the code since an analysis specification could have multiple references to the same module. This feels overly complex since our packages are capable of handling multiple analyses and so I'd propose we restrict v1.x analysis specification to contain only 1 reference to a module.resultsDataModelSpecification.csv
in the package. In Strategus v0.x, a module could hold theresultsDataModelSpecification.csv
in the module's R project or in the package. By in-sourcing the modules, each package is required to put theresultsDataModelSpecification.csv
in a well defined location so it may be included in the results output for the module. UPDATE: These modules will store theirresultsDataModelSpecification.csv
in Strategus for now. CohortIncidence will look to move this functionality into the package for the v5.x release line.resultsDataModelSpecification.csv
) and theresultsDataModelSpecification.csv
. There is no need for a .zip file that contains the results - this can be done after the execution is complete. Noting this since CohortMethod's uploadResults function take a .zip file for input which was different from the pattern I had expected. Furthermore, I think each module package should adopt RMM for results table creation and upload where possible. This, in part, is to make sure that we can migrate results data model when necessary. Maybe more important than that though is so we can document the results data model per Document results data model for modules #143. UPDATE: This is also a note for CohortDiagnostics and SelfControlledCaseSeries.I welcome any feedback/suggestions - tagging some of the HADES Strategus WG members including @schuemie @azimov @chrisknoll @jreps.