Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sharedFolder to execution settings #21

Open
anthonysena opened this issue Oct 27, 2022 · 5 comments · May be fixed by #161
Open

Add sharedFolder to execution settings #21

anthonysena opened this issue Oct 27, 2022 · 5 comments · May be fixed by #161
Milestone

Comments

@anthonysena
Copy link
Collaborator

In speaking with @jreps we need a mechanism for modules to pass information to each other. For example, for prediction models, we'd want to have a way to transfer models (either locally or from an external location) so that a validation module could pick up the model and perform the validation.

@schuemie
Copy link
Member

Would this be sharing within the same workflow? If it is between workflows I'm not sure how we can ensure the files are there.

@anthonysena
Copy link
Collaborator Author

anthonysena commented Oct 28, 2022

This would be for within a workflow, at least to start. Here are the 2 use-cases:

1. Internally trained models

The high-level workflow steps would be:

  • Run the PLP module and train X number of models
  • Transfer the models to a shared location for use by the next module
  • Run the PLP validation module on each of the X models

2. Externally trained models

Outside of the workflow, the models would be posted to a known location (i.e. https://github.com/OHDSI/models). Then, the high-level workflow steps would be:

  • Transfer the external models to a shared location
  • Run the PLP validation module on each of the X models sourced externally

The aim would be to allow the model transfer module to work with either externally sourced models or internally trained models so that there is consistency in the analysis specification for the workflow. Let me know if this is clear and @jreps feel free to add any details I may have missed.

@anthonysena anthonysena added this to the v0.2.0 milestone Dec 4, 2023
@egillax
Copy link

egillax commented Jul 23, 2024

@anthonysena I wanted to note how we did this for our DeepLearningComparison study.

The workflow was done in two parts.

  1. Development workflow:
    1.1 Run PLP module and develop a bunch of models and internally validate them
    1.2 Share results with study coordinator (zipped strategusOutput folder)
    1.3 Study coordinator uploads results to s3 bucket.

  2. Validation workflow (can only run once all sites have run development):
    2.1 Run ModelTransferModule which downloads model from github, s3 or local path. In our case we used s3. In this case the s3 location is part of the Strategus settings and credentials to access the s3 are shared separately and added to site environment.
    2.2 Run PatientLevelPredictionValidationModule. Currently this looks for a ModelTransferModule working directory from the same workflow and gets the models from there and externally validates them for that site.
    2.3 Share results with coordinator.

There are a few things to note here:

  • This needs to be run in two workflows, every site needs to finish developing and sharing their models before the validation workflow can run
  • In step 2.2 there is an implicit dependency between PatientLevelPredictionValidatonModule and ModelTransferModule. For running the validation module successfully requires a ModelTransferModule output. Is there some way of specifying such a dependency in Strategus v1.0 ? Similary to how many packages depend on CohortGeneratorModule.
  • In step 2.2 we use the working directory of ModelTransferModule. However in the output directory of that module there is currently a csv which looks like this:
originalLocation modelSavedLocally localLocation
models/conv-ipci-a5.zip TRUE /studyOutputPath/strategusWork/ModelTransferModule_1/models/model_s3_1
models/conv-opehr-a5.zip TRUE /studyOutputPath/strategusWork/ModelTransferModule_1/models/model_s3_2

I think it's better to use only the output directory of the module for data that other modules need. Copying the models to the output directory and having the localLocation in the csv point there.

  • In some cases like air gapped systems the settings of the ModelTransferModule need to be overwritten with a local path instead of a path to s3 or github.

I don't think a sharedFolder is necessary if the above approach with a ModelTransferModule is used, if I'm not missing anything. The external location is in the settings of the module and it takes care of fetching the models from there to a standard location where the validation module can access it.

@jreps feel free to add if I'm missing something.

@anthonysena
Copy link
Collaborator Author

Thanks @egillax for the additional information from your DeepLearningComparison study.

I've made an initial draft of the PLP Validation & ModelTransfer modules per #161. I have not added tests just yet and as part of that, I wanted to respond to your question here:

In step 2.2 there is an implicit dependency between PatientLevelPredictionValidatonModule and ModelTransferModule. For running the validation module successfully requires a ModelTransferModule output. Is there some way of specifying such a dependency in Strategus v1.0 ? Similary to how many packages depend on CohortGeneratorModule.

In V1.x, I decided that having an explicit requirement for module dependencies was too strict. Instead the current execute code is written to find the CohortGeneratorModule and if this is present, it will execute that module first and then execute any remaining modules. I think in the case of the ModelTransferModule, this implicit dependency falls on the person responsible for the coding portion of the study - they will need to make sure the transfer occurs of running the PLP validation. Do I have this correct? If so this is similar to the EvidenceSynthesisModule in so much that the ES Module assumes your results are loaded into a results database that contains the full set of estimation results. If new estimation results are added to the results, the ES Module must be run again.

@egillax
Copy link

egillax commented Sep 5, 2024

Hi @anthonysena , Yes I think it's fine that the person responsible for the coding makes sure to run the ModelTransferModule before the validation module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants