Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow automated and customisable result schema creation using RMM package #43

Closed
azimov opened this issue Mar 4, 2023 · 6 comments · Fixed by #145
Closed

Allow automated and customisable result schema creation using RMM package #43

azimov opened this issue Mar 4, 2023 · 6 comments · Fixed by #145
Milestone

Comments

@azimov
Copy link
Collaborator

azimov commented Mar 4, 2023

Currently, the management of results files is somewhat limiting in this package because there are custom, post analysis scripts that run and handle the creation of schemas and upload of any results files.

To resolve this issue I propose the utilisation of the RMM package to automatically handle the results upload in most cases, and in cases for modules where some schema specific creation parameters and data manipulation are required prior to upload allow callbacks inside the module execution context to do this.

For schema creation

Create a targets workflow for creating schemas for each module. This should be ran only once.

In the default case:

  • Find resultsDataModelSpecification.csv file (either exposed in the module via overridable function getDataModelSpecifications) and export it to work folder
  • Call ResultModelManager::generateSqlSchema to create schema sql and execute (setting table prefix and database schema from module settings)

In the case where customizability is required:

  • Pass jobContext to a function that can optionally be added to the Main.R file of the module createDataModelSchema this should execute schema creation (e.g. PLP and CohortDiagnostics already have functions to do this)

For results upload

A similar pattern is required. However, in this case the results uploding should be considered a targets::tar_target instance that depends on the base module.

In the default case:

  • Find resultsDataModelSpecification.csv file (either exposed in the module via overridable function getDataModelSpecifications) and export it to work folder
  • (outside of the renv context) Upload results files according to this model spec using ResultModelManager::uploadResults

In the custom case:

  • Pass the jobContext to a function that can optionally be added to the Main.R file of the module uploadResultsCallback this should execute schema creation (e.g. PLP and CohortDiagnostics already have functions to do this and implementation for @jreps and I would be trivial)

See incoming PR for an implementation.

@azimov azimov mentioned this issue Mar 4, 2023
@anthonysena anthonysena added this to the v0.1.0 milestone Mar 6, 2023
@anthonysena
Copy link
Collaborator

A couple of notes after trying out this functionality:

  • The schema creation does not use the "work" folder approach which may be useful to store the target script generated, logs, etc.
  • The "database_meta_table" is created by Strategus itself and not by the modules; this is currently a gap since the schema creation does not include this table.
  • I didn't see a mechanism to detect if tables exist? Perhaps this is exposed in the RMM package and we just need to expose it via Strategus?

@azimov
Copy link
Collaborator Author

azimov commented Sep 19, 2023

Thanks @anthonysena, to your points:

The schema creation does not use the "work" folder approach which may be useful to store the target script generated, logs, etc.

This is probably a relatively easy change and reasonable as it's likely that we will need to debug so I can have a look at do it.

The "database_meta_table" is created by Strategus itself and not by the modules; this is currently a gap since the schema creation does not include this table.

Strategus could use RMM to create the result schema or it may be desirable to include this in RMM itself as its likely going to be used by all analytics results sets

I didn't see a mechanism to detect if tables exist? Perhaps this is exposed in the RMM package and we just need to expose it via Strategus?

Do you mean if they have been created or just to check if the schema conforms to the spec? This hasn't been implemented in RMM but I'm not sure what the intended purpose is here?

@anthonysena anthonysena modified the milestones: v0.1.0, v0.2.0 Oct 2, 2023
@anthonysena
Copy link
Collaborator

Do you mean if they have been created or just to check if the schema conforms to the spec? This hasn't been implemented in RMM but I'm not sure what the intended purpose is here?

Sorry that my question was unclear - I think the upload mechanism always assumes it should create the target tables in the results schema which might drop existing results if such a schema & results tables already exist. I believe both options are supported by RMM already - its most likely that I just need to allow the user to specify this in the executionSettings object.

@anthonysena anthonysena modified the milestones: v0.1.1, v0.2.0 Dec 4, 2023
@anthonysena anthonysena linked a pull request Jul 15, 2024 that will close this issue
@anthonysena
Copy link
Collaborator

Linking this specific issue to the draft PR #145 since I think this need is addressed in that feature branch. I think that branch covers the desires of this issue but if not we can keep this open and address it specifically.

@anthonysena
Copy link
Collaborator

Closing this out since v1.0 of Strategus will allow each module to expose the mechanisms to create their results tables & upload results. In this case, either the module will use the ResultModelManager or have its own implementation of these functions.

@chrisknoll
Copy link
Contributor

Might want to keep this open: my concern is that the things like OHDSIShinyModules depends on a certain database results schema that may be beyond the concern of the individual statistics packages. Instead, I would propose a layer between the modules and the UI like:

ShinyModule  -------- >     Results Model    <------   Strategus Module

In this way, what the modules need to map their underlying package to is defined in Results Model and the data that the UI reports on is also defined in the same place (the Results Model). This allows us to coordinate changes between both sides by first defining it in the middle and then pushing the implementation of reporting and generation to the sides. It also means that the underlying packages can be free to develop new features independent of module versions, but once it's time to report the data in the standard UI app, it becomes part of the results model and the strategus module maps the new data inot the results model, and the shiny module implements the report on it.

Other advantages to this type of separation: ShinyModule bug fixes can be developed independently of results model or strategus module releases. Hopefully this level of separation makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants