Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Make (mostly) pgai 0.4.0 compatible #32

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

Askir
Copy link
Contributor

@Askir Askir commented Oct 28, 2024

I've loaded a dump of a timescaledb after running the quickstart and wrote some tests with the sync client.
I then tried to make those tests pass.

Changes I made:

  • Rename contents -> chunk
  • Rename id -> embedding_uuid (but configurable)
  • Add the option to configure an embedding_table_name (otherwise it is assumed to be table_name + "_store")
  • Made metadata column name configurable and optional (default off)
  • Added/Fixed the return type of search and the filter params type

Some things are a bit confusing:

  1. Are inserts/deletes/updates supposed to work? The whole idea of pgai 0.4.0 is that you don't have to manage these.
    The fact that the pgai table schema has a foreign key and the previous examples don't... makes this pretty hard to enable through the client right now.

  2. Same for index operations, these are created natively by the SQL functions right?

  3. The Vectorize test breaks because the langchain_community integration isn't updated yet to provide the new embedding_table_name. This is a bit odd as a dev workflow, I'm wondering if it makes sense to define the integrations in this package too and then just import them in the integrations? That would allow for easier testing I think.

About the embedding_table_name: Since pgai creates a view on top of the vectordb which e.g. holds metadata. I've for now made it the default that you pass in the view name and the library "assumes" the embeddings_store table. For search operations it simply uses the view. For anything else the underlying embeddings_store table (but due to problems from 1.) only deletes really work rn).

How to Review?
I think the best way to understand what works and what doesn't is too have a look at the new compatibility_test.py file. The first two tests are just validating that the import worked then a client is created and the individual methods are tested.

@Askir Askir changed the base branch from fix-typing to main October 28, 2024 23:23
@Askir Askir requested a review from cevian October 29, 2024 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant