Add sharedFolder to execution settings #21

anthonysena · 2022-10-27T15:48:12Z

In speaking with @jreps we need a mechanism for modules to pass information to each other. For example, for prediction models, we'd want to have a way to transfer models (either locally or from an external location) so that a validation module could pick up the model and perform the validation.

schuemie · 2022-10-28T10:22:42Z

Would this be sharing within the same workflow? If it is between workflows I'm not sure how we can ensure the files are there.

anthonysena · 2022-10-28T14:17:38Z

This would be for within a workflow, at least to start. Here are the 2 use-cases:

1. Internally trained models

The high-level workflow steps would be:

Run the PLP module and train X number of models
Transfer the models to a shared location for use by the next module
Run the PLP validation module on each of the X models

2. Externally trained models

Outside of the workflow, the models would be posted to a known location (i.e. https://github.com/OHDSI/models). Then, the high-level workflow steps would be:

Transfer the external models to a shared location
Run the PLP validation module on each of the X models sourced externally

The aim would be to allow the model transfer module to work with either externally sourced models or internally trained models so that there is consistency in the analysis specification for the workflow. Let me know if this is clear and @jreps feel free to add any details I may have missed.

egillax · 2024-07-23T11:40:38Z

@anthonysena I wanted to note how we did this for our DeepLearningComparison study.

The workflow was done in two parts.

Development workflow:
1.1 Run PLP module and develop a bunch of models and internally validate them
1.2 Share results with study coordinator (zipped strategusOutput folder)
1.3 Study coordinator uploads results to s3 bucket.
Validation workflow (can only run once all sites have run development):
2.1 Run ModelTransferModule which downloads model from github, s3 or local path. In our case we used s3. In this case the s3 location is part of the Strategus settings and credentials to access the s3 are shared separately and added to site environment.
2.2 Run PatientLevelPredictionValidationModule. Currently this looks for a ModelTransferModule working directory from the same workflow and gets the models from there and externally validates them for that site.
2.3 Share results with coordinator.

There are a few things to note here:

This needs to be run in two workflows, every site needs to finish developing and sharing their models before the validation workflow can run
In step 2.2 there is an implicit dependency between PatientLevelPredictionValidatonModule and ModelTransferModule. For running the validation module successfully requires a ModelTransferModule output. Is there some way of specifying such a dependency in Strategus v1.0 ? Similary to how many packages depend on CohortGeneratorModule.
In step 2.2 we use the working directory of ModelTransferModule. However in the output directory of that module there is currently a csv which looks like this:

originalLocation	modelSavedLocally	localLocation
models/conv-ipci-a5.zip	TRUE	/studyOutputPath/strategusWork/ModelTransferModule_1/models/model_s3_1
models/conv-opehr-a5.zip	TRUE	/studyOutputPath/strategusWork/ModelTransferModule_1/models/model_s3_2

I think it's better to use only the output directory of the module for data that other modules need. Copying the models to the output directory and having the localLocation in the csv point there.

In some cases like air gapped systems the settings of the ModelTransferModule need to be overwritten with a local path instead of a path to s3 or github.

I don't think a sharedFolder is necessary if the above approach with a ModelTransferModule is used, if I'm not missing anything. The external location is in the settings of the module and it takes care of fetching the models from there to a standard location where the validation module can access it.

@jreps feel free to add if I'm missing something.

anthonysena · 2024-09-03T20:35:13Z

Thanks @egillax for the additional information from your DeepLearningComparison study.

I've made an initial draft of the PLP Validation & ModelTransfer modules per #161. I have not added tests just yet and as part of that, I wanted to respond to your question here:

In step 2.2 there is an implicit dependency between PatientLevelPredictionValidatonModule and ModelTransferModule. For running the validation module successfully requires a ModelTransferModule output. Is there some way of specifying such a dependency in Strategus v1.0 ? Similary to how many packages depend on CohortGeneratorModule.

In V1.x, I decided that having an explicit requirement for module dependencies was too strict. Instead the current execute code is written to find the CohortGeneratorModule and if this is present, it will execute that module first and then execute any remaining modules. I think in the case of the ModelTransferModule, this implicit dependency falls on the person responsible for the coding portion of the study - they will need to make sure the transfer occurs of running the PLP validation. Do I have this correct? If so this is similar to the EvidenceSynthesisModule in so much that the ES Module assumes your results are loaded into a results database that contains the full set of estimation results. If new estimation results are added to the results, the ES Module must be run again.

egillax · 2024-09-05T10:11:45Z

Hi @anthonysena , Yes I think it's fine that the person responsible for the coding makes sure to run the ModelTransferModule before the validation module.

anthonysena added this to the v0.2.0 milestone Dec 4, 2023

anthonysena added a commit that referenced this issue Jan 13, 2024

Allow for use of relative paths - fixes #21

85537cb

anthonysena added the execution label Jul 16, 2024

anthonysena linked a pull request Sep 3, 2024 that will close this issue

Add PatientLevelPredictionValidationModule & ModelTransferModule #161

Draft

anthonysena linked a pull request Sep 13, 2024 that will close this issue

Add PatientLevelPredictionValidationModule & ModelTransferModule #161

Draft

anthonysena modified the milestones: v1.0.0, v1.1.0 Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sharedFolder to execution settings #21

Add sharedFolder to execution settings #21

anthonysena commented Oct 27, 2022

schuemie commented Oct 28, 2022

anthonysena commented Oct 28, 2022 •

edited

Loading

egillax commented Jul 23, 2024

anthonysena commented Sep 3, 2024

egillax commented Sep 5, 2024

Add sharedFolder to execution settings #21

Add sharedFolder to execution settings #21

Comments

anthonysena commented Oct 27, 2022

schuemie commented Oct 28, 2022

anthonysena commented Oct 28, 2022 • edited Loading

1. Internally trained models

2. Externally trained models

egillax commented Jul 23, 2024

anthonysena commented Sep 3, 2024

egillax commented Sep 5, 2024

anthonysena commented Oct 28, 2022 •

edited

Loading