Having one vector column for multiple text columns on Qdrant #3755

vhdmoradi · 2024-03-03T11:14:27Z

vhdmoradi
Mar 3, 2024

I have a products table that has a lot of columns, which from these, the following ones are important for our search:

Title 1 to Title 6 (title in 6 different languages)
Brand name (in 6 different languages)
Category name (in 6 different languages)
Product attributes like size, color, etc. (in 6 different languages)

We are planning on using qdrant vector search to implement fast vector queries. But the problem is that all the data important for searching, are in different columns and I do not think (correct me if I am wrong) generating vector embeddings separately for all the columns is the best solution.

I came up with the idea of mixing the columns together and generating separate collections; and I came up with this solution because the title, the category, brand and attrs columns are essentially the same just in different langs.

Also I use the "BAAI/bge-m3" model which is a multilingual text embedding model that supports more than 100 langs.

So, in short, I created different collections for different languages, and for each collection I have a vector column containing the vector for the combined text of title, brand, color, and category in each language and when searched, because we already know which language the website is, we will search in that specific language collection.

Now, the question is, is this a valid method? What are the pros and cons of this method? I know for sure that when combined, I can not give different weights to different parts of this vector. For example one combined text of title, category, color, and brand may look like this:

"Koala patterned hoodie children blue Bubito"

or Something like:

"Striped t-shirt men navy blue Zara"

Now, user may search "blue hoodie for men", but due to the un-weighted structure of the combined vector, it will not retrieve the best results.

I may be wrong and this may be one of the best results, but please tell me more about the pros and cons of this method, and if you can, give me a better idea.

It is important to note that currently we have more than 300,000(300K) products and they will grow to more than 1,000,000 (1M) in the near future.

generall · 2024-03-04T12:38:03Z

generall
Mar 4, 2024
Maintainer

Overall mixing different columns in one vector fields should be fine. At least for different titles. But having category name embedded might produce a lot of noise of duplicated vectors, which is not good for vector index.

1 reply

vhdmoradi Mar 4, 2024
Author

I have to change some small details in my question, which I guess won't change it a lot, but just for being sure.
The updated situation is this:
For each product, we have the fields "title, category, brand, color, size" fields in 6 different languages. We know the website language from the api call, so I thought using separate collections for each language might not be a bad idea. But for each language, I mixed the "title, category, brand, color, size" columns into one string and generated the embedding for the resulting string. The reason is that most of the time user queries include these keywords inside them, and since creating separate vector columns and tokenizing the search query and separating the different keywords is another challenge for itself, we used this text combination method.
Now that I have elaborated the situation, it would be nice of you to update your advice if needed.

vhdmoradi · 2024-03-10T05:48:02Z

vhdmoradi
Mar 10, 2024
Author

I changed and clarified the question. Any idea is welcome.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qdrant

Having one vector column for multiple text columns on Qdrant #3755

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Qdrant

Having one vector column for multiple text columns on Qdrant #3755

vhdmoradi Mar 3, 2024

Replies: 2 comments · 1 reply

generall Mar 4, 2024 Maintainer

vhdmoradi Mar 4, 2024 Author

vhdmoradi Mar 10, 2024 Author

vhdmoradi
Mar 3, 2024

Replies: 2 comments 1 reply

generall
Mar 4, 2024
Maintainer

vhdmoradi Mar 4, 2024
Author

vhdmoradi
Mar 10, 2024
Author