Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executing settings functions inside renvs #51

Closed
azimov opened this issue Mar 16, 2023 · 4 comments
Closed

Executing settings functions inside renvs #51

azimov opened this issue Mar 16, 2023 · 4 comments

Comments

@azimov
Copy link
Collaborator

azimov commented Mar 16, 2023

There will be a problem on many systems where creating a study requires installed versions of specific R packages.

For example, if I want to use CohortGenerator with subsets I can define this in my base R/system R environment with CohortGenerator version 0.8.0 and then feed this to a package using version0.7.0. In this situation I could add Cohorts with subsets.
However, doing this one of two things will happen:

  • The Strategus CohortGenerator module will crash because it can't handle the subset payloads
  • The Strategus CohortGenerator module will run and not error, not generating my subsetted cohorts (which is arguably worse)

Both these situations are bad and (currently) the only solution would be to make sure your system env and Strategus module env are the same.

Note that this will happen even with any parameters. For example, cohort diagnostics creates a FeatureExtraction default feature set inside the base package.

However - I propose an alternative solution; to allow execution of settings creation inside module renv contexts and make this the standard procedure.

In the simplest case, the module functions exposed (e.g. getSharedResources) should not require many changes, but module developers should be provided with an API to call into the module environment.

In a more complex case I would like to be able to create arbitrary code execution inside the renv with something like this:

cohortDefinitionPayload <- withModuleRenv(
      module = "CohortGenerator",
      version ... 
      code = {
          library(CohortGenerator) # Load renv version

          cohortDefinitionSet <- loadCohortDefinitionSet(...)

          subsetDefinition <- createCohortSubsetDefinition(
           .... DO stuff ...
          )
         cohortDefinitionSet %>% addCohortSubsetDefinition(subsetDefinition)
         
      })

Note, I include similar code (though not designed to be exported in its current state) in this PR.

There is a lot of complexity here though - we would only really want this code to create serialized payloads for the strategus design definitions.

Passing data between the calling script and the renv is also an example. The best solution for these situations is probably to store the input and output as an RDS file somewhere (rstudio optionally does this when sourcing jobs, allowing you to copy the calling environment to the child process and allowing the child process to copy results back to the parent R session).

@azimov azimov changed the title Executing settings functions inside renvs/loading libs Executing settings functions inside renvs Mar 16, 2023
@schuemie
Copy link
Member

I think this is yet another argument for why, for the 'main' Strategus modules, we want them to be integrated into the Strategus package, using a single central lock file, as argued here.

@anthonysena
Copy link
Collaborator

Just making a note since we're moving towards moving the modules into Strategus v1.x but I think this is still a potential issue. Similar to the discussion in #148, what's our process for initializing an environment for designing a Strategus study?

Here is a proposed workflow, assuming that: an OHDSI network study is proposed, a protocol exists, etc and we've setup our R environment following the HADES R Setup instructions:

  1. Create a new R Project
  2. renv::init() to set up the renv code infrastructure
  3. Download the latest HADES-wide lock file for use in the project (assumption: Strategus is in the HADES-wide lock file)
  4. renv::restore() to ensure the HADES-wide dependencies are installed.
  5. Follow the Strategus vignette for creating an analysis specification and save the analysis specification in the R Project somewhere.
  6. Run some smoke tests to make sure it works (need to expand on how to do this but likely run study against Eunomia?).
  7. Upload to a study repo on https://github.com/ohdsi-studies

The workflow above is still not sufficient to answer the question: how do I know that the environment used to design the study is the same as the one specified (in the renv.lock) to execute it? Here are some thoughts:

  • Have a place in the Strategus analysis specification to hold a checksum of the renv.lock file used to produce the study. If this value is not set in the analysis specification, all bets are off and we just use the base R environment.
  • Record the HADES package version in the analysis specification and check this when running a Strategus study. I think this may be hard due to nested dependencies, etc.
  • We can trust developers to do the right thing so we shouldn't worry about this😏

Looking forward to hearing others ideas on this topic & approach.

@schuemie
Copy link
Member

Even though I suggested that for now we keep the renv lock file and specifications separately, I propose further down the line we integrate the two into a single object. We really need to ensure the set of R packages used to create the specifications is the same as the one used to execute the analysis, and therefore we need a hard link between lock file and specifications.

Having a hash code is nice, but how do you find the renv lock file that matches it? And what if you can't find a match?

Recording the HADES package versions is the same as integrating a renv lock file, right?

@anthonysena
Copy link
Collaborator

We really need to ensure the set of R packages used to create the specifications is the same as the one used to execute the analysis, and therefore we need a hard link between lock file and specifications.

We're agreed on this point and we can use #144 to discuss how we want to include this into the analysis specification. @chrisknoll mentioned in #147 (comment):

...A release of Strategus comes with a published renv.lock file that contains which versions of package dependencies have been tested with the given version of Strategus, and that within a single release of Strategus you may have multiple updates to underlying packages.

I'd support this idea as well - we want to have a release of Strategus that has been verified to work with the dependencies detailed in the renv.lock file.

I'm going to close this issue since I'd prefer to track these types of design discussions and decisions in #148 since for v1.x we're decided on in-sourcing the modules and having a single renv.lock (somewhere) that controls the dependencies for the packages. What's not clear at the moment is how we 1) recommend developers use Strategus to create a network study and 2) how we distribute a network study which should be the focus in #148.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants