Executing settings functions inside renvs #51

azimov · 2023-03-16T18:54:37Z

There will be a problem on many systems where creating a study requires installed versions of specific R packages.

For example, if I want to use CohortGenerator with subsets I can define this in my base R/system R environment with CohortGenerator version 0.8.0 and then feed this to a package using version0.7.0. In this situation I could add Cohorts with subsets.
However, doing this one of two things will happen:

The Strategus CohortGenerator module will crash because it can't handle the subset payloads
The Strategus CohortGenerator module will run and not error, not generating my subsetted cohorts (which is arguably worse)

Both these situations are bad and (currently) the only solution would be to make sure your system env and Strategus module env are the same.

Note that this will happen even with any parameters. For example, cohort diagnostics creates a FeatureExtraction default feature set inside the base package.

However - I propose an alternative solution; to allow execution of settings creation inside module renv contexts and make this the standard procedure.

In the simplest case, the module functions exposed (e.g. getSharedResources) should not require many changes, but module developers should be provided with an API to call into the module environment.

In a more complex case I would like to be able to create arbitrary code execution inside the renv with something like this:

cohortDefinitionPayload <- withModuleRenv(
      module = "CohortGenerator",
      version ... 
      code = {
          library(CohortGenerator) # Load renv version

          cohortDefinitionSet <- loadCohortDefinitionSet(...)

          subsetDefinition <- createCohortSubsetDefinition(
           .... DO stuff ...
          )
         cohortDefinitionSet %>% addCohortSubsetDefinition(subsetDefinition)
         
      })

Note, I include similar code (though not designed to be exported in its current state) in this PR.

There is a lot of complexity here though - we would only really want this code to create serialized payloads for the strategus design definitions.

Passing data between the calling script and the renv is also an example. The best solution for these situations is probably to store the input and output as an RDS file somewhere (rstudio optionally does this when sourcing jobs, allowing you to copy the calling environment to the child process and allowing the child process to copy results back to the parent R session).

The text was updated successfully, but these errors were encountered:

schuemie · 2023-03-17T12:13:17Z

I think this is yet another argument for why, for the 'main' Strategus modules, we want them to be integrated into the Strategus package, using a single central lock file, as argued here.

anthonysena · 2024-07-16T18:59:38Z

Just making a note since we're moving towards moving the modules into Strategus v1.x but I think this is still a potential issue. Similar to the discussion in #148, what's our process for initializing an environment for designing a Strategus study?

Here is a proposed workflow, assuming that: an OHDSI network study is proposed, a protocol exists, etc and we've setup our R environment following the HADES R Setup instructions:

Create a new R Project
renv::init() to set up the renv code infrastructure
Download the latest HADES-wide lock file for use in the project (assumption: Strategus is in the HADES-wide lock file)
renv::restore() to ensure the HADES-wide dependencies are installed.
Follow the Strategus vignette for creating an analysis specification and save the analysis specification in the R Project somewhere.
Run some smoke tests to make sure it works (need to expand on how to do this but likely run study against Eunomia?).
Upload to a study repo on https://github.com/ohdsi-studies

The workflow above is still not sufficient to answer the question: how do I know that the environment used to design the study is the same as the one specified (in the renv.lock) to execute it? Here are some thoughts:

Have a place in the Strategus analysis specification to hold a checksum of the renv.lock file used to produce the study. If this value is not set in the analysis specification, all bets are off and we just use the base R environment.
Record the HADES package version in the analysis specification and check this when running a Strategus study. I think this may be hard due to nested dependencies, etc.
We can trust developers to do the right thing so we shouldn't worry about this😏

Looking forward to hearing others ideas on this topic & approach.

schuemie · 2024-07-17T07:35:08Z

Even though I suggested that for now we keep the renv lock file and specifications separately, I propose further down the line we integrate the two into a single object. We really need to ensure the set of R packages used to create the specifications is the same as the one used to execute the analysis, and therefore we need a hard link between lock file and specifications.

Having a hash code is nice, but how do you find the renv lock file that matches it? And what if you can't find a match?

Recording the HADES package versions is the same as integrating a renv lock file, right?

anthonysena · 2024-07-18T14:12:10Z

We really need to ensure the set of R packages used to create the specifications is the same as the one used to execute the analysis, and therefore we need a hard link between lock file and specifications.

We're agreed on this point and we can use #144 to discuss how we want to include this into the analysis specification. @chrisknoll mentioned in #147 (comment):

...A release of Strategus comes with a published renv.lock file that contains which versions of package dependencies have been tested with the given version of Strategus, and that within a single release of Strategus you may have multiple updates to underlying packages.

I'd support this idea as well - we want to have a release of Strategus that has been verified to work with the dependencies detailed in the renv.lock file.

I'm going to close this issue since I'd prefer to track these types of design discussions and decisions in #148 since for v1.x we're decided on in-sourcing the modules and having a single renv.lock (somewhere) that controls the dependencies for the packages. What's not clear at the moment is how we 1) recommend developers use Strategus to create a network study and 2) how we distribute a network study which should be the focus in #148.

azimov changed the title ~~Executing settings functions inside renvs/loading libs~~ Executing settings functions inside renvs Mar 16, 2023

anthonysena added this to the v0.2.0 milestone Dec 4, 2023

anthonysena added study design process analysis specification labels Jul 16, 2024

anthonysena closed this as completed Jul 18, 2024

This was referenced Jul 18, 2024

Restructure the analysis specification and remove sharedResources #144

Open

How to design and distribute a Strategus network study #148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Executing settings functions inside renvs #51

Executing settings functions inside renvs #51

azimov commented Mar 16, 2023

schuemie commented Mar 17, 2023

anthonysena commented Jul 16, 2024

schuemie commented Jul 17, 2024

anthonysena commented Jul 18, 2024

Executing settings functions inside renvs #51

Executing settings functions inside renvs #51

Comments

azimov commented Mar 16, 2023

schuemie commented Mar 17, 2023

anthonysena commented Jul 16, 2024

schuemie commented Jul 17, 2024

anthonysena commented Jul 18, 2024