Integrate vector search into pipeline #19

kuraisle · 2024-08-05T14:26:18Z

There are 697 informal names where the LLM gave a sensible output (not blank or a “No specific drug name” response) in Esmond’s dataset. Of these, 63 (9%) are exact matches to a concept in the RxNorm vocabulary. Of the rest, in 208 cases, a vector search gives the exact same answer as GPT-3. As the vector search has a far lower computational cost, and can succesfully answer at least 39% of queries, it's worth integrating into the pipeline. There might be further improvements if a little effort is made.

My experiment with this used roughly this code:

from dotenv import load_dotenv
from sqlalchemy import create_engine
from os import environ
from urllib.parse import quote_plus
import pandas as pd
import txtai
import time
import numpy as np

load_dotenv()

DB_HOST = environ["DB_HOST"]
DB_USER = environ["DB_USER"]
DB_PASSWORD = quote_plus(environ["DB_PASSWORD"])
DB_NAME = environ["DB_NAME"]
DB_PORT = environ["DB_PORT"]
DB_SCHEMA = environ["DB_SCHEMA"]

connection_string = (
    f"postgresql://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}"
)
engine = create_engine(connection_string)

rxNorm_concepts = pd.read_sql(
    f"""
    SELECT concept_id, concept_name
    FROM {DB_SCHEMA}.concept
    WHERE
        vocabulary_id = 'RxNorm'
    """,
    con=engine
)

embeddings = txtai.Embeddings(path="neuml/pubmedbert-base-embeddings", content=True)

embeddings.index(rxNorm_concepts.apply(lambda x: (x.concept_id, x.concept_name, None), axis = 1))

Then using embeddings.search() you can fetch the closest n embeddings.

We need to

The text was updated successfully, but these errors were encountered:

kuraisle mentioned this issue Oct 4, 2024

Bugfix/reinstate vector llm pipeline #52

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate vector search into pipeline #19

Integrate vector search into pipeline #19

kuraisle commented Aug 5, 2024 •

edited

Loading

Integrate vector search into pipeline #19

Integrate vector search into pipeline #19

Comments

kuraisle commented Aug 5, 2024 • edited Loading

kuraisle commented Aug 5, 2024 •

edited

Loading