[Question]: knowledge base : pgvector db structure need help #17446

naveenr-btc · 2025-01-07T13:24:43Z

Question Validation

I have searched both the documentation and discord for an answer.

Question

I have embeddings of text already. The embeddings are available in a table that has structure as

my_embeddings (
id BIGSERIAL PRIMARY KEY,
embedding vector(1536),
);

when using knowledge base, i'm getting error telling certain columns are missing. So need to understand what columns are missing. and also it'll be helpful if i can get an example for those columns of what it should hold.
Note: my use case is that the llms needs to understand from those knowledge base for better results not querying the actual content from it.

dosubot · 2025-01-07T13:26:05Z

Hey @naveenr-btc! I'm here to assist you with any bugs, questions, or contributions. Let's tackle this together!

To use pgvector with LlamaIndex, your knowledge base table needs to have specific columns. The required columns are:

id UUID DEFAULT (gen_random_uuid())
embedding VECTOR({num_dimensions})
content TEXT
metadata JSONB

If tenant_aware is set to True, you also need:

tenant_id UUID

Your current table structure is missing the content and metadata columns, which are necessary for LlamaIndex to function correctly. You can refer to the _create_tables method in the NileVectorStore class for more details on the table structure ^[1].

Additionally, here's an example of setting up a knowledge base table using SQLAlchemy and pgvector:

from pgvector.sqlalchemy import Vector
from sqlalchemy import insert, create_engine, String, text, Integer
from sqlalchemy.orm import declarative_base, mapped_column

Base = declarative_base()

class SECTextChunk(Base):
    __tablename__ = "sec_text_chunk"

    id = mapped_column(Integer, primary_key=True)
    page_label = mapped_column(Integer)
    file_name = mapped_column(String)
    text = mapped_column(String)
    embedding = mapped_column(Vector(384))

Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)

# Insert into Database
for node in nodes:
    row_dict = {
        "text": node.get_content(),
        "embedding": node.embedding,
        **node.metadata,
    }
    stmt = insert(SECTextChunk).values(**row_dict)
    with engine.connect() as connection:
        cursor = connection.execute(stmt)
        connection.commit()

This example demonstrates how to define a table schema and insert data with embeddings into a PostgreSQL database with pgvector ^[2].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

logan-markewich · 2025-01-07T14:28:55Z

The db structure is here

llama_index/llama-index-integrations/vector_stores/llama-index-vector-stores-postgres/llama_index/vector_stores/postgres/base.py

Line 98 in 07a8320

class AbstractData(base): # type: ignore

Haven't taken the time to add options for people to "bring their own" in the postgres vector store class.

naveenr-btc · 2025-01-07T14:53:07Z

ok Thanks for pointing it out, Can you guys tell me what these missed columns should hold ?

naveenr-btc · 2025-01-13T12:34:05Z

Hi, can you tell me what the nodeId and metadata_ should hold ? i always get empty response

2025-01-13 17:58:38,971 INFO sqlalchemy.engine.Engine select pg_catalog.version()
2025-01-13 17:58:38,971 INFO sqlalchemy.engine.Engine [raw sql] {}
2025-01-13 17:58:38,999 INFO sqlalchemy.engine.Engine select current_schema()
2025-01-13 17:58:38,999 INFO sqlalchemy.engine.Engine [raw sql] {}
2025-01-13 17:58:39,009 INFO sqlalchemy.engine.Engine show standard_conforming_strings
2025-01-13 17:58:39,009 INFO sqlalchemy.engine.Engine [raw sql] {}
2025-01-13 17:58:39,020 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-01-13 17:58:39,021 INFO sqlalchemy.engine.Engine SELECT schema_name FROM information_schema.schemata WHERE schema_name = %(schema_name)s
2025-01-13 17:58:39,021 INFO sqlalchemy.engine.Engine [generated in 0.00036s] {'schema_name': 'public'}
2025-01-13 17:58:39,032 INFO sqlalchemy.engine.Engine COMMIT
2025-01-13 17:58:39,037 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-01-13 17:58:39,037 INFO sqlalchemy.engine.Engine CREATE EXTENSION IF NOT EXISTS vector
2025-01-13 17:58:39,038 INFO sqlalchemy.engine.Engine [generated in 0.00029s] {}
2025-01-13 17:58:39,047 INFO sqlalchemy.engine.Engine COMMIT
2025-01-13 17:58:39,062 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-01-13 17:58:39,069 INFO sqlalchemy.engine.Engine SELECT pg_catalog.pg_class.relname
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace
WHERE pg_catalog.pg_class.relname = %(table_name)s AND pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s, %(param_2)s, %(param_3)s, %(param_4)s, %(param_5)s]) AND pg_catalog.pg_namespace.nspname = %(nspname_1)s
2025-01-13 17:58:39,070 INFO sqlalchemy.engine.Engine [generated in 0.00055s] {'table_name': 'data_icd_diag_code_list_embeddings_llama', 'param_1': 'r', 'param_2': 'p', 'param_3': 'f', 'param_4': 'v', 'param_5': 'm', 'nspname_1': 'public'}
2025-01-13 17:58:39,082 INFO sqlalchemy.engine.Engine COMMIT
2025-01-13 17:58:39,088 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2025-01-13 17:58:39,099 INFO sqlalchemy.engine.Engine SELECT public.data_icd_diag_code_list_embeddings_llama.id, public.data_icd_diag_code_list_embeddings_llama.node_id, public.data_icd_diag_code_list_embeddings_llama.text, public.data_icd_diag_code_list_embeddings_llama.metadata_, public.data_icd_diag_code_list_embeddings_llama.embedding <=> %(embedding_1)s AS distance
FROM public.data_icd_diag_code_list_embeddings_llama ORDER BY distance asc
LIMIT %(param_1)s
2025-01-13 17:58:39,100 INFO sqlalchemy.engine.Engine [generated in 0.00437s] {'embedding_1': '[-0.006608377210795879,0.009919406846165657,-0.01648673787713051,-0.01925048977136612,-0.015364819206297398,0.02609146013855934,-0.03855570778250694, ... (32642 characters truncated) ... 0.021056506782770157,0.011862242594361305,-0.017512883991003036,0.020646048709750175,-0.020755503326654434,0.012163245119154453,0.007135131862014532]', 'param_1': 2}
2025-01-13 17:58:39,120 INFO sqlalchemy.engine.Engine COMMIT
output : Empty Response
END TASK d50c8376-b66a-4c21-9cdc-238726c16bac :: end time : 1736771319.1257265 :: execution time : 1.2649040222167969
END PIPELINE Pipeline :: end time : 1736771319.1257265 :: execution time : 1.2649040222167969
Completed

Process finished with exit code 0

@dosu

naveenr-btc · 2025-01-13T17:38:39Z

@dosu can you help me with this issue

dosubot · 2025-01-13T17:40:10Z

The node_id and metadata_ columns in the PGVectorStore class serve specific purposes:

node_id: This column is used to uniquely identify each node in the database. It is crucial for managing and retrieving data, ensuring that each node can be uniquely identified during query execution. The node_id is also used to deduplicate results, ensuring that no duplicate nodes are included in the final query results ^[1].
metadata_: This column stores additional information about each data entry in a JSON or JSONB format. The metadata can include any supplementary data that might be useful for reconstructing nodes or providing context during query execution. It is retrieved and included in the query results, allowing for more detailed and context-rich data retrieval ^[1].

If you're getting empty responses, ensure that these columns are correctly populated in your database. The node_id should be unique for each entry, and the metadata_ should contain relevant JSON data that can be used during queries. If these columns are not populated correctly, it could lead to issues with data retrieval and processing.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

naveenr-btc added the question Further information is requested label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: knowledge base : pgvector db structure need help #17446

[Question]: knowledge base : pgvector db structure need help #17446

naveenr-btc commented Jan 7, 2025

dosubot bot commented Jan 7, 2025

logan-markewich commented Jan 7, 2025

naveenr-btc commented Jan 7, 2025

naveenr-btc commented Jan 13, 2025 •

edited

Loading

naveenr-btc commented Jan 13, 2025

dosubot bot commented Jan 13, 2025

[Question]: knowledge base : pgvector db structure need help #17446

[Question]: knowledge base : pgvector db structure need help #17446

Comments

naveenr-btc commented Jan 7, 2025

Question Validation

Question

dosubot bot commented Jan 7, 2025

logan-markewich commented Jan 7, 2025

naveenr-btc commented Jan 7, 2025

naveenr-btc commented Jan 13, 2025 • edited Loading

naveenr-btc commented Jan 13, 2025

dosubot bot commented Jan 13, 2025

naveenr-btc commented Jan 13, 2025 •

edited

Loading