Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: More complex metadata filtering #1040

Closed
KylePancamo opened this issue Aug 24, 2023 · 5 comments
Closed

[Feature Request]: More complex metadata filtering #1040

KylePancamo opened this issue Aug 24, 2023 · 5 comments
Labels

Comments

@KylePancamo
Copy link

KylePancamo commented Aug 24, 2023

Describe the problem

When filtering on metadata, there are only 6 operators for performing filtering. When dealing with string filtering, these are not enough for more desired complexities.

Example:
Let's say you want to store emails in chroma, and have the ability to do question answering on those emails, but also filter them by certain criteria. For example, say subject, to, cc, etc.

If you'd like to filter on the subject for example, you would have to provide an exact match with the $eq operator. If the subject is long and verbose, it's highly likely the user won't recall the subject to a precise match, then filtering would be pointless on the field.

Describe the proposed solution

Add in a $contains operator or some similar operator

{"subject": {"$contains": search_term}}

Example:

Document - Text containing this feature request
metadata:
- subject: "[Feature Request]: More complex metadata filtering"
- to: "[email protected]"

Possible filter terms on subject
-- "[Feature Request]:"
-- "Complex filtering"
-- "Metadata filtering"
-- "More filtering"

The search would return this or any documents containing the those keyword matches. This is just one example, but there are without a doubt more use cases where this could be beneficial. Having the ability to do keyword or similarity search on metadata values can lead to greater accuracy when querying a collection.

Alternatives considered

No response

Importance

would make my life easier

Additional Information

No response

@KylePancamo KylePancamo added the enhancement New feature or request label Aug 24, 2023
@tazarov
Copy link
Contributor

tazarov commented Aug 24, 2023

@KylePancamo thank you for this.

I think we can easily use SQLite's LIKE to achieve this:
e.g.

SELECT * FROM table_name WHERE column_name LIKE '%substring%';

@jeffchuber
Copy link
Contributor

@tazarov feels like this should be a CIP - as it has implications for single-node and the distributed architecture and we will want to think somewhat carefully about it.

@l4b4r4b4b4
Copy link

That would be a good feature together with generally allowing complex metadata types.

@KylePancamo
Copy link
Author

@tazarov @jeffchuber Haven't been keeping up here. Has this been further considered?

@tazarov
Copy link
Contributor

tazarov commented Oct 12, 2023

@KylePancamo @jeffchuber I think this is addressed in #1196

@jeffchuber, regarding distributed, I'll have to look into the PR whether that would be a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants