Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline for non-drug concepts #40

Open
3 tasks
kuraisle opened this issue Aug 30, 2024 · 1 comment
Open
3 tasks

Pipeline for non-drug concepts #40

kuraisle opened this issue Aug 30, 2024 · 1 comment

Comments

@kuraisle
Copy link
Collaborator

Now we have shown that LLettuce can work for drug concepts, we need to expand to non-drug concepts.

This could be as simple as removing references to "domain = 'Drug'" from queries. However, the OMOP vocabularies are large, and querying the whole database will be slow. However, we could get NLP to help us. Here's a rough scheme:

graph TD
    input --> domain[Guess the domain of input]
    domain --> class_m[Guess the class of input]
    class_m --> search[Semantic search]
    search --> search_res_class{Acceptable guess?}
    search_res_class --Yes--> User
    search_res_class --No--> search_no_class{Search - no class}
    search_no_class -- Acceptable --> User
    search_no_class -- No --> search_no_domain{Search - no domain}
    search_no_domain --Acceptable--> User
    search_no_domain --No--> Surrender
Loading

Estimating the class will be less useful - the main thing will be to narrow it down to domain. I would guess it's harder for an NLP system to achieve, too, so we can test, but

For this to work we will need a new pipeline

  • Add domain estimation to the pipeline. This could be LLM, or zero-shot labelling with embeddings
  • Update OMOP ORM models to include domain and class
  • Redesign OMOP queries to accept non-drug domains
@kuraisle
Copy link
Collaborator Author

kuraisle commented Sep 11, 2024

Using Co-connect twins as an example

The dataset

We have a dataset to use as an example for adding non-drug domains to LLettuce's use-case. This is the TwinsUK phenobase. The part of this that's interesting for us is the "Variables" sheet of a spreadsheet I was sent. Within this, there are two columns that are interesting:

PhenotypeName PhenotypeDescription
Sensitivity to Allergens Score of subjects' sensitivities to 112 allergen components using frozen serum.
Clotting factor Results for VWF (von Willebrand factor)
... ...

There are 8144 of these name/description pairs. In OMOP, there's a "CO-CONNECT TWINS" vocabulary. 4234 of the PhenotypeNames match a CO-CONNECT TWINS concept. I've retrieved the standard concepts for these non-standard concepts. This provides a nice example for us to test versions of LLettuce. The PhenotypeDescription is the kind of long description of something that LLettuce is well positioned to parse into standard concepts. By making this PhenotypeDescription -> PhenotypeName -> CO-CONNECT TWINS -> OMOP standard concept chain, I've made a table of:

PhenotypeDescription OMOP standard concepts
About how long did you smoke for in total? - months ((:relationship "Maps to" :concept "Cigarette smoker"))
About how long did you smoke for in total? - years ((:relationship "Maps to value" :concept "Cigarette smoker")(:relationship "Maps to" :concept "History of event")(:relationship "Maps to" :concept "Currently doesn't use tobacco or its derivatives"))
... ...

Getting LLettuce to predict the right column from the left is what we want to test. The exact format of how the OMOP standard concepts is represented isn't important, it could be JSON or whatever, as long as it can be parsed into a set of relationships to concepts. An important thing to note is that a mapping can be made to multiple concepts.

Preliminary test

I fine-tuned Flan-T5-small on 80%/10% train/test split of the dataset. It did OK, given the small size of the model. I calculated the precision, recall, and $F_1$ score of this against a 10% validation (or holdout) set.

Future direction

A useful comparison to make will be between a fine-tuned Flan-T5 model and Llama 3.1. The steps for this will be:

  • Implement precision, recall and $F_1$ score in the evaluation framework Evaluation framework for models and prompts #5
  • Split off a validation set that both models can be evaluated on
  • Fine-tune a larger Flan model
  • Define a prompt Llama 3.1 can use
  • Run the fine-tuned Flan-T5 and Llama on the validation set
  • Run evaluation metrics and compare models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant