Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Querying by metadata fields is not flexible enough #1195

Closed
abrhaleitela opened this issue Sep 29, 2023 · 12 comments · May be fixed by #1393
Closed

[Feature Request]: Querying by metadata fields is not flexible enough #1195

abrhaleitela opened this issue Sep 29, 2023 · 12 comments · May be fixed by #1393
Labels
enhancement New feature or request

Comments

@abrhaleitela
Copy link

abrhaleitela commented Sep 29, 2023

Describe the problem

I am weighing up the trade-off between creating thousands of chroma collections and having few collections with more complex metadata objects so that I will be able to achieve filtering/querying based on different data type operations.

Do you plan to support (in the near future) more operations and data types (mainly custom objects such as json objects) in a given metadata?

Example:

collection.add(
    documents=[doc1],
    metadatas=[{"metadata1": [{"k1": "v1"}, {"k2": "v2"}]}],
    ids=["id1"]
)

Also, do you happen to have any plan to support string $contains operations in metadata where condition?

Example:

result = collection.query(
    query_texts=["This is sample query text"],
    where={"string_type_metadata_field": {"$contains": "substring"}}
)

Describe the proposed solution

  1. I would have loved to see if collection metadata can contain fields of any type. Examples list, map, set, json, etc. Today only supported operations seems to be: str, int, float or bool
  2. When querying with where keyword, I would love to see more operations supported like string/list/map/set contains keywords. Today only supported operations seems to be: $gt, $gte, $lt, $lte, $ne, $eq, $in, $nin

Alternatives considered

No response

Importance

would make my life easier

Additional Information

No response

@abrhaleitela abrhaleitela added the enhancement New feature or request label Sep 29, 2023
@nielscs
Copy link

nielscs commented Oct 18, 2023

I fully support the proposed features.
One addtion:

Besides $contains, I would also appreciate $regex (as in MongoDB: @Link)

Thanks for the excellent work so far!

@jelena-sarajlic
Copy link

I also needed more filtering possibilities, so I went to investigate what can be done. The collections are implemented as SQL databases, so I don't think supporting more complex metadata would be possible (correct me if I'm wrong :)).

However, additional operators, such as the $regex @nielscs mentioned and something similar to the $contains @abrhaleitela mentioned, can be implemented, and they can also serve as a workaround for not having more complex metadata.

For example, it would be ideal for my use case to have a list in one metadata field and then filter the database based on what is in the list. I implemented the $like operator for the where operation and the $regex operator for both the where and where_document operations, and I was able to simulate the behavior I needed using these.

I created a pull request with these changes (#1393) ; hopefully that helps!

@LazyAIEnjoyer
Copy link

LazyAIEnjoyer commented Jan 8, 2024

I am confused by your examples. You are saying that you want to apply this filtering on list metadata, but looking at your examples I don't see lists as metadata but just strings. I have the same problem, so I guess I have to make my list metadata into a string and then apply the like operator to see if the string contains my substring?

@pevogam
Copy link

pevogam commented Jan 10, 2024

Not having a simple $like operator like in most SQL-based databases is almost a deal breaker to me and I realized the option is missing after setting up a lot of code to use Chroma. Even if Chroma cannot offer something as powerful as a $regex at least $contains (LIKE '%string%') would be greatly appreciated.

@tazarov
Copy link
Contributor

tazarov commented Jan 10, 2024

@pevogam, we have a pending PR on this #1196. Adding these operators is not that difficult, but the team is mindful of adding operators that might be difficult to carry over to distributed/hosted version of Chroma.

@pevogam
Copy link

pevogam commented Jan 11, 2024

@pevogam, we have a pending PR on this #1196. Adding these operators is not that difficult, but the team is mindful of adding operators that might be difficult to carry over to distributed/hosted version of Chroma.

Thanks @tazarov for linking the PR here for those that end up investigating for issues first. In case the functionality should be disabled in certain applications and can easily be made available in others perhaps we can simply detect the type of use and disable it? But I will check for details now in the PR.

@valentin-fngr
Copy link

Author mentioned adding lists to metadata. Is it something that might happen eventually ?
The way I store lists is by doing str(my_list).

@btonasse
Copy link

I wholly support this. I find it a bit silly that metadata can only be strings. In my use case, for example, I have a list of documents extracted from a pdf, where each document is a page. That document contains an outline and an index, and I would love to add a list of keywords (extracted from the outline and/or index) to the metadata of each page. But right now the best I can do is save the list as a string and then split the string again when I need to consume the metadata, which is a silly extra step that is also error prone if you're not careful about your separators.

@Raj725
Copy link

Raj725 commented Jul 3, 2024

Complex metadata support is much needed.

@armouti
Copy link

armouti commented Aug 11, 2024

Yes please, at least a string-contains query in metadata would go a long way

@owquresh
Copy link

Complex metadata support such a list of strings is needed ASAP. Especially when other vector dbs (Qdrant) support this.

@itaismith
Copy link
Contributor

Tracked in #3415 and #3416

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.