You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add the metadata filter abstraction proposed in #256. It does not have to be connected to the ragna.core.Chat as this can be done in a follow-up PR. I just want to have it well tested before moving forward with the rest of the design.
The MetadataFilter object should have the following properties:
It needs to support the following operators:
logical and
logical or
equal
not equal
less than
less or equal than
greater than
greater or equal than
in
not in
I have surveyed the ecosystem and this set of operators is supported by Chroma, LanceDB, Pinecone and Vectara. Meaning, this is a good base set.
It needs to have an "escape hatch operator", which can be used to supply a raw filter to the source storage. This is useful in case the user only uses one source storage (production use case) and it supports more operations than just the base case. Meaning, we are not restricting users to the set of operators above.
It needs to be fully JSON (de-)serializable
Value and/or benefit
First building block for allowing users to have a corpus of documents instead of uploading new ones for every chat.
Feature description
Add the metadata filter abstraction proposed in #256. It does not have to be connected to the
ragna.core.Chat
as this can be done in a follow-up PR. I just want to have it well tested before moving forward with the rest of the design.The
MetadataFilter
object should have the following properties:It needs to support the following operators:
I have surveyed the ecosystem and this set of operators is supported by Chroma, LanceDB, Pinecone and Vectara. Meaning, this is a good base set.
It needs to have an "escape hatch operator", which can be used to supply a raw filter to the source storage. This is useful in case the user only uses one source storage (production use case) and it supports more operations than just the base case. Meaning, we are not restricting users to the set of operators above.
It needs to be fully JSON (de-)serializable
Value and/or benefit
First building block for allowing users to have a corpus of documents instead of uploading new ones for every chat.
Anything else?
I have a complete implementation already in https://gist.github.com/pmeier/38ee90be6c30ecdf9bbec086a0dabafe. Unless there are some concerns, it can just be ported.
The text was updated successfully, but these errors were encountered: