Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create json schema for strategeus inputs #9

Open
ablack3 opened this issue Jun 17, 2022 · 9 comments
Open

Create json schema for strategeus inputs #9

ablack3 opened this issue Jun 17, 2022 · 9 comments

Comments

@ablack3
Copy link

ablack3 commented Jun 17, 2022

My understanding is that Strategus will redefine the json representation for OHDSI studies.

Would it be possible to create a JSON-schema ( using https://json-schema.org/ or https://cran.r-project.org/web/packages/jsonvalidate/vignettes/jsonvalidate.html or something similar) that can be used to validate Strategus inputs?

Strategus takes two json files as input: Analysis specifications and Execution Settings

An example Analysis specification is at https://github.com/OHDSI/Strategus/blob/main/extras/cgCdAnalysisSpecifications.json

An example Execution settings json is at https://github.com/OHDSI/Strategus/blob/main/extras/cgCdExecutionSettings.json

@ablack3
Copy link
Author

ablack3 commented Jun 20, 2022

Here is an attempt to define a schema for a "concept" (or is it called a "concept expression"?) This is probably not exactly correct. Should each module define its own schema?

I'm not sure if jsonvalidate supports references (defined using $ref) in schemas.

schema <- '
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://ohdsi.org/concept.schema.json",
  "title": "concept",
  "description": "An OMOP concept expression",
  "type": "object",
  "properties": {
    "concept": {
      "description": "An OMOP concept",
      "type": "object",
      "properties": {
        "CONCEPT_CLASS_ID": {
          "type" : "string"
        },
        "CONCEPT_CODE": {
          "type" : "string"
        },
        "CONCEPT_ID": {
          "type" : "integer",
          "minimum": 0
        },
        "CONCEPT_NAME": {
          "type" : "string"
        },
        "DOMAIN_ID": {
          "type" : "string"
        },
        "INVALID_REASON": {
          "type" : "string"
        },
        "INVALID_REASON_CAPTION": {
          "type": "string"
        },
        "STANDARD_CONCEPT": {
          "type": "string",
          "enum": ["S", "N", "C"]
        },
        "STANDARD_CONCEPT_CAPTION": {
          "type": "string"
        },
        "VOCABULARY_ID": {
          "type": "string"
        }
      },
      "required": ["CONCEPT_CLASS_ID", "CONCEPT_CODE", "CONCEPT_ID", "CONCEPT_NAME", "DOMAIN_ID", "INVALID_REASON", "INVALID_REASON_CAPTION", "STANDARD_CONCEPT", "STANDARD_CONCEPT_CAPTION", "VOCABULARY_ID"]
    },
    "isExcluded": {
      "description": "Should this concept be excluded from the concept set?",
      "type": "boolean"
    },
    "includeDescendants": {
      "description": "Should descendants be included/excluded? (true or false)",
      "type": "boolean"
    },
    "includeMapped": {
      "description": "Should mapped source concepts be included/excluded? (true or false)",
      "type": "boolean"
    }
  },
  "required": [ "concept" ]
}
'

validator <- jsonvalidate::json_validator(schema, engine = "ajv")

jsonToValidate <- '
 {
  "concept": {
    "CONCEPT_CLASS_ID": "Clinical Finding",
    "CONCEPT_CODE": "41309000",
    "CONCEPT_ID": 201612,
    "CONCEPT_NAME": "Alcoholic liver damage",
    "DOMAIN_ID": "Condition",
    "INVALID_REASON": "V",
    "INVALID_REASON_CAPTION": "Valid",
    "STANDARD_CONCEPT": "S",
    "STANDARD_CONCEPT_CAPTION": "Standard",
    "VOCABULARY_ID": "SNOMED"
  },
  "includeDescendants": true
}'

validator(jsonToValidate)
#> [1] TRUE

Created on 2022-06-19 by the reprex package (v2.0.1)

@schuemie
Copy link
Member

Hi @ablack3 ! Yes, we're thinking of how to specify the input to Strategus. I've been playing around with JSON-schema as well, as you can see here. You can view an example JSON Scheme for the CohortDiagnosticsModule here.

In R we'd need functions to validate the JSON (jsonvalidate seems a good candidate!), and functions to unserialize the JSON into whatever R objects we need. Shouldn't be too difficult.

An open question is what would generate the JSON. It could become part of ATLAS, in which case it might be nice to have the ability to generate Java code stubs from the JSON Schema (which I think is possible).

@ablack3
Copy link
Author

ablack3 commented Jun 21, 2022

We're thinking alike! I learned the the jsonschema package does support references to external json files (using $ref) but does yet support URL references.

I'd vote for Atlas as json generator. Having solid and stable json schemas will help with collaboration between the java and R side. circe-be would be a good place to start for someone who wants to work on json schemas since I don't think we have circe-be schemas (as far as I know) and those are stable.

@schuemie
Copy link
Member

Good point. I'm also pretty sure the Circe JSON schema will be the most complicated one.

@ablack3
Copy link
Author

ablack3 commented Mar 10, 2023

To make this issue/task more concrete, let's start with creating a json schema for this Strategus study json example https://github.com/OHDSI/Strategus/blob/develop/inst/testdata/analysisSpecification.json

Maybe I or someone else can take this on. I'd like to put these schemas on the Open Source Community website (in development)

@gkovaig
Copy link

gkovaig commented Oct 24, 2023

Hi @schuemie and @ablack3, Raj Manickam here (https://github.com/gkovaig). Not sure if this item was discussed and assigned at the Hack-a-thon.

Given that the schema for strategus is an aggregation of schema defined in other components (including CohortGenerator, circe-be, etc.) it makes sense to come up with a set of specifications that cross-reference each other.

From this note it appears that jsonvalidator can use an alternate engine another json validator that supports many more features, including nested schemas that span multiple files, meta schema versions later than draft-04, validating using a subschema, and validating a subset of an input data object.

@schuemie
Copy link
Member

Thanks Raj. Creating the JSON schema for the Strategus modules and collecting them in one place is high on our priority list (I had hoped we could work on this during the hackathon, but we needed to have a discussion on overall Strategus architecture first)

@anthonysena anthonysena added this to the v0.2.0 milestone Dec 4, 2023
@ablack3
Copy link
Author

ablack3 commented Jan 22, 2024

@anthonysena we should take up this topic in the strategus working group. It also relates to some of the work going on with Arachne and specifying execution environments like this one: https://storage.googleapis.com/arachne-datanode/descriptor_base.json

{
  "id" : "Default",
  "bundleName" : "r_base_focal_amd64.tar.gz",
  "label": "Default runtime",
  "osLibraries" : [ ],
  "executionRuntimes" : [ {
    "type" : "R",
    "version" : "4.1.2",
    "dependencies" : [ {
      "name" : "testthat",
      "version" : "3.1.4",
      "dependencySourceType" : "CRAN",
      "preInstallScripts" : null,
      "postInstallScripts" : null
    }, {
      "name" : "rematch2",
      "version" : "2.1.2",
      "dependencySourceType" : "CRAN",
      "preInstallScripts" : null,
      "postInstallScripts" : null
    }, {
      "name" : "assertthat",
      "version" : "0.2.1",
      "dependencySourceType" : "CRAN",
      "preInstallScripts" : null,
      "postInstallScripts" : null
    }, {
      "name" : "AUC",
      "version" : "0.3.2",
      "dependencySourceType" : "CRAN",
      "preInstallScripts" : null,
      "postInstallScripts" : null
    }, {
      "name" : "lubridate",
      "version" : "1.7.10",
      "dependencySourceType" : "CRAN",
      "preInstallScripts" : null,
      "postInstallScripts" : null
    }, {
      "name" : "htmlwidgets",
      "version" : "1.5.3",
      "dependencySourceType" : "CRAN",
      "preInstallScripts" : null,
      "postInstallScripts" : null
    }, ]
  } ]
}

@anthonysena
Copy link
Collaborator

Noting that something similar has been done for renv: rstudio/renv#1889. This may be applicable to validating the structure of the analysis specification if the renv.lock file is moved into the analysis specification per #144.

@anthonysena anthonysena modified the milestones: v1.0.0, Backlog Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants