Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linkability to design documents to implement JSON Hypertext Application Language (HAL) #98

Open
azimov opened this issue Oct 21, 2023 · 3 comments · May be fixed by #114
Open

Add linkability to design documents to implement JSON Hypertext Application Language (HAL) #98

azimov opened this issue Oct 21, 2023 · 3 comments · May be fixed by #114

Comments

@azimov
Copy link
Collaborator

azimov commented Oct 21, 2023

Strategus design documents are currently isolated resources but there are a number of limitations with this approach in terms of meta-data linkability.

For example:

If I have a study design document I want to link to the protocol for the study and any phenotype algorithms used in the study. And If I'm searching for studies I want to be able to find them based on drug/disease areas of interest, or to relate interests and other investigators may wish to be aware of studies executed in related areas which can be greatly aided through linkability.

My proposal is that we adopt JSON HAL

This approach has two classes for consideration:

  • Resources - proper json documents
  • Links - references to other json documents

Any resource can also, optionally, include a link.

Link Resource Caching

The embedded object is important in our context because any link can be embedded in the document. This means we could support both the definitions:

cohortDefinitions:{
   "_links": {
       cohort1: {href:"https://phenotype_library.com/phenotype_id"},
       cohort2: {href:"https://phenotype_library.com/phenotype_id"}
   }
}

AND

cohortDefinitions:{
   "_links": {
       cohort1: {href:"https://phenotype_library.com/phenotype_id"},
       cohort2: {href:"https://phenotype_library.com/phenotype_id"}
   },

  "_embedded" : {
      ... < cohort definitions>
    }
}

This is referred to as the "hypertext cache pattern" and would allow us to share payloads that include all external resources (which is strongly desired for passing studies around) but could start to create maintainability and auditability headaches.

Security note

The HAL design should never include executable content, this would include the embedded JSON that we currently use for cohort definitions. Though this is low risk it would, potentially, be exploitable.

@anthonysena anthonysena added this to the v0.2.0 milestone Dec 4, 2023
@azimov azimov linked a pull request Jan 15, 2024 that will close this issue
@schuemie
Copy link
Member

A crucial part of the Strategus specs is that they're self-contained, for at least two reasons:

  1. For reproducibility: I want to be able to run a study even when external libraries may have changed.
  2. For air-gapped environments, where I don't have access to external libraries.

I'm all for including meta-data in Strategus specifications that allow you to trace where cohort definitions etc. came from. But using HAL seems to turn this around: the external link is required, but the embedding is optional?

Also, where would the URLs come from? In your example you made up a "phenotype_library.com", but what would be a real example of a URL in OHDSI? How would it work if I for example design a study inside my organization, and would like to run it as an OHDSI network study?

@azimov
Copy link
Collaborator Author

azimov commented Jan 16, 2024

A crucial part of the Strategus specs is that they're self-contained, for at least two reasons:

  1. For reproducibility: I want to be able to run a study even when external libraries may have changed.
  2. For air-gapped environments, where I don't have access to external libraries.

The files can either be embedded. A resource can (and should) have multiple links. We can propose a practice of having a URL for its original source (if in the phenotype library) and the local path relative to the document.

Also, where would the URLs come from? In your example you made up a "phenotype_library.com", but what would be a real example of a URL in OHDSI? How would it work if I for example design a study inside my organization, and would like to run it as an OHDSI network study?

The path can be any URI, and best practice for us would be a relative path. To me it seems preferable to have a tarball for a study, as opposed to a single document, as when you get to the hundreds or thousands of cohorts you can actually audit them. The use of "_embed" gives us the flexibility to cache the resources inside the document (and also adds an extra form of validation at run time to see if the resources are present).

@schuemie
Copy link
Member

I know all of this is very subjective, but I think there's a lot of benefit to the simplicity of one study - one JSON file.

Changing that to a tarbal with internal relative path linkages adds a lot of complexity, for no obvious gain. It also doesn't server the purpose of documenting where the artifacts (e.g. cohorts) came from, which I thought was the reason you proposed HAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants