WIP Make (mostly) pgai 0.4.0 compatible #32
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I've loaded a dump of a timescaledb after running the quickstart and wrote some tests with the sync client.
I then tried to make those tests pass.
Changes I made:
contents
->chunk
id
->embedding_uuid
(but configurable)embedding_table_name
(otherwise it is assumed to betable_name + "_store"
)Some things are a bit confusing:
Are inserts/deletes/updates supposed to work? The whole idea of pgai 0.4.0 is that you don't have to manage these.
The fact that the pgai table schema has a foreign key and the previous examples don't... makes this pretty hard to enable through the client right now.
Same for index operations, these are created natively by the SQL functions right?
The Vectorize test breaks because the langchain_community integration isn't updated yet to provide the new
embedding_table_name
. This is a bit odd as a dev workflow, I'm wondering if it makes sense to define the integrations in this package too and then just import them in the integrations? That would allow for easier testing I think.About the
embedding_table_name
: Since pgai creates a view on top of the vectordb which e.g. holds metadata. I've for now made it the default that you pass in the view name and the library "assumes" the embeddings_store table. For search operations it simply uses the view. For anything else the underlying embeddings_store table (but due to problems from 1.) only deletes really work rn).How to Review?
I think the best way to understand what works and what doesn't is too have a look at the new compatibility_test.py file. The first two tests are just validating that the import worked then a client is created and the individual methods are tested.