Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Vocabulary Mapping Service #109

Open
josephjclark opened this issue Nov 13, 2024 · 1 comment
Open

Epic: Vocabulary Mapping Service #109

josephjclark opened this issue Nov 13, 2024 · 1 comment

Comments

@josephjclark
Copy link
Collaborator

josephjclark commented Nov 13, 2024

Overview

A service which generates a list of mapping for some source strings to some mapped vocabulary values - like a SNOMED or LOINC code.

This issue is a design for a general mapping service - but we'll be starting with a more specific use-case.

Issues

Inputs

  • Some list of string values to map. These must be in some kind of key: value format, where the key will be mapped to a value in the output.
    • We don't have a formal design for the input yet. Likely we need unskilled users to be able to give us some kind of dump of input data. We should be able to support different input formats.
  • Some list of target vocabularies or ontologies (ie, snomed).
    • Because these vocabularies can be huge, can we take and map some kind of path?
    • Can we support custom standards?
  • A template for output mappings - ie, a flat string, or a FHIR coding, or some other data structure.

Outputs

A JSON object which can be used as a lookup table in job code.

The mapping will work something like this:

// This is our generated structure
const mappings = {
  'zero': {
    "system": "http://snomed.info/sct",
    "code": "105542008",
      "display": "Non - drinker"
  }
}

// Cross reference the key against our input data to get the mapped value
const mappedValue = mappings[input.answer];

Design

All supported ontologies will be be embedded in a vector database

For each value in the input list:

  • Use similarity analysis to find possible matches in the embeddings (ie, load N possible matches from the database)
  • Call a model to figure out how the best mapping based on the shortlist of possibilities
  • Fit the mapped value into the correct data structure

Repeat this process until all inputs are mapped.

Return a JSON structure.

API

The input payload should be something like:

  • Inputs: key-value input values to map
  • Hint: A custom prompt written by the user to guide generation
  • Targets: list of target ontologies as mapping tagets
  • Collection: { name, key } send the created data structure straight to a Lightning Collection

Future Work

Handling uncertainty: If mappings are not clear, how can the service refer to a human for correction and validation?

Validation: Can we provide any built-in validation? Unit tests? Sample structures? An extra pipeline step which runs some extra analysis?

@josephjclark
Copy link
Collaborator Author

josephjclark commented Nov 26, 2024

How will we release this thing (eventually?)

  • @openfn/cli (openfn apollo map-vocab inputs.json --collection ayos-collection)
  • general open source tool?
  • Lightning integration with a nice UI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant