-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue: metadata filtering #26
Comments
I really wish this feature can be available soon. |
@asg017 +1. Looks like langchain expects the metadata to be available as a dictionary. I have tried the integration and this is the last remaining piece to migrate fully out of ChromaDB. |
+1 |
@asg017 First, thanks for making |
@charnould the PR is #124 but it needs tests and docs. Aiming for public release in 3 weeks (~Nov 19th) but hoping to get a beta out before then that you can try! |
@asg017 happy to beta test when you release it! |
As of tldr you have a choice between storing metdata in Closing this issue now, feel free to file more bugs/issues if needed! |
sqlite-vec
doesn't have good metadata filtering as ofv0.1.0
. Only vector columns can be declared in thevec0
constructor. You can do pre-filtering withvec_column IN (...)
queries, but that's slow and inconvenient.I'm thinking:
genre
,release_date
,rating
, andis_3d
would all be "metadata" columns. You could do queries like:We could capture all the
WHERE
clauses to ensure that the top 20 returned vectors match that criteria.A few open questions:
How do we store metadata values?
We could store in OLTP-fashion with the
_rowids
shadow tables, but that may be slow. We could store in column-oriented fashion to match the vector column formats, but unsure how much faster that would be.How would this work with ANN indexes?
🤷
What datatypes to support?
Ideally everything, ideally
STRICT
. But if we do column-oriented we'd need a strict subset. like:TEXT
INT
DOUBLE
BLOB
BOOLEAN
DATE
/DATETIME
Maybe we could do dictionary encoding for text values? maybe that's a column option, like
genre text encoding=dictionary
or something. MaybeENUM
s?NULL
/NOT NULL
?The text was updated successfully, but these errors were encountered: