Create new “How to choose an embedder” guide #3040

guimachiavelli · 2024-11-11T17:14:31Z

Recent customer feedback indicates users are struggling to move beyond the basic AI-powered search tutorial and implement hybrid search in their own projects.

Main points to address:

Define what are the main differences between available embedders
Ideally we should be less “this is the best option for use case X” and more “if your app does Y, choose embedders with high A”
- pinging @dureuill: do you think it is possible to create general guidelines as described above? What are the things users should look at when trying to decide which embedder to choose?

dureuill · 2024-11-12T10:46:22Z

Hey @guimachiavelli 👋

I guess we're in a bit of a situation where "if you have to ask, just use OpenAI".

A very rough outline could be:

If unsure, use OpenAI
If your app has a feature to search by images, audio, or anything that is not text, you need to embed these media separately and use the user-provided embedder.
If your app relies on a specific model or embedder, or you are already using a specific embedding provider (Azure, Mistral, etc), then use the REST embedder.
- If the remote embedder is an ollama server, prefer the ollama embedder instead.
Otherwise, use OpenAI.
If you really cannot use OpenAI or any other embedding provider, consider the Hugging Face embedder. The Hugging Face embedder is best suited when you have a small number of documents (in the 10k) and don't intend to update them often.

guimachiavelli · 2024-11-12T15:10:01Z

Thanks for the reply, @dureuill, much appreciated!

A small follow-up regarding your second point: does the user-provided embedder suggestion applies to documents with no meaningful textual fields, to non-textual queries, or both? I have realised it's not completely clear to me how we accommodate users with non-textual documents.

dureuill · 2024-11-12T15:22:25Z

does the user-provided embedder suggestion applies to documents that with no meaningful textual fields, to non-textual queries, or both?

Meilisearch does not support non-textual fields natively (you can include an image in a document as base64, or reference it via its URL, but you cannot meaningfully search that document from that image) in documents nor in search requests.

As soon as you use a user-provided embedder, you need to provide vectors both in your documents and in your semantic/hybrid search queries.

From there any combination of textual/non textual is possible: as image embedding models appear to generally first do image -> text, and then text -> embedding, one can choose to embed either text or image both at indexing and search time. All the embedding operations have to be done outside of Meilisearch, though.

guimachiavelli linked a pull request Nov 28, 2024 that will close this issue

Add new article on how to choose embedder type #3058

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create new “How to choose an embedder” guide #3040

Create new “How to choose an embedder” guide #3040

guimachiavelli commented Nov 11, 2024

dureuill commented Nov 12, 2024 •

edited

Loading

guimachiavelli commented Nov 12, 2024 •

edited

Loading

dureuill commented Nov 12, 2024 •

edited

Loading

Create new “How to choose an embedder” guide #3040

Create new “How to choose an embedder” guide #3040

Comments

guimachiavelli commented Nov 11, 2024

dureuill commented Nov 12, 2024 • edited Loading

guimachiavelli commented Nov 12, 2024 • edited Loading

dureuill commented Nov 12, 2024 • edited Loading

dureuill commented Nov 12, 2024 •

edited

Loading

guimachiavelli commented Nov 12, 2024 •

edited

Loading

dureuill commented Nov 12, 2024 •

edited

Loading