Advanced Search Plugin: Semantic Search #35

toBeOfUse · 2024-12-19T19:35:05Z

Typesense has had good support for retrieving and storing vector embeddings for documents for a while, which enables semantic search on them. This enables a use case where customers can be shown products related to their search, even if there is no textual match from their search query. For example, if a customer searches for "apples", a semantic search can be used which will also match related things, like oranges and pears. This is particularly useful if there are no (or very few) results for a search; embeddings can be used to get some kind of product to suggest, even if you don't have exactly what they were looking for.

To enable this, a new field can be added to the Typesense schema. They give a great example of this in their docs. Then, queries can be made using "query_by": "embedding" to match on that field.

The text was updated successfully, but these errors were encountered:

toBeOfUse · 2024-12-21T20:43:53Z

I'm currently implementing this feature as a custom plugin that relies on the TypesenseService and the AdvancedSearchService myself, and one complication that I'd like to surface here is the fact that Typesense can generate embeddings for search queries on the fly (which is the "semantic search" mode in the docs), but it does not (yet) cache the vector embeddings that it generates for queries. You can cache whole sets of search results, but not per-query embeddings specifically (and any cache that stores a whole set of search results will need to be invalidated eventually (but embeddings will not.)) This has a huge impact on query performance, especially if the embedding is being generated by an API call - using Typesense's built-in support for OpenAI's text embedding models results in queries that take around 500ms instead of 70ms if you supply the embedding to Typesense directly (using its "nearest neighbor" mode.)

To some extent, this is an inevitable consequence of this mode of search, but it seems like it could be ameliorated by using Vendure's new CacheService to cache embeddings for tokens (although I believe this would raise the minimum Vendure version needed to use this plugin), and even pre-generating and caching embeddings for a set of known likely search queries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced Search Plugin: Semantic Search #35

Advanced Search Plugin: Semantic Search #35

toBeOfUse commented Dec 19, 2024

toBeOfUse commented Dec 21, 2024

Advanced Search Plugin: Semantic Search #35

Advanced Search Plugin: Semantic Search #35

Comments

toBeOfUse commented Dec 19, 2024

toBeOfUse commented Dec 21, 2024