Skip to content

Support embedders setting and other vector/hybrid search related configuration #554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

CommanderStorm
Copy link
Contributor

@CommanderStorm CommanderStorm commented Mar 2, 2024

Pull Request

Related issue

Fixes #541
Fixes #612
Fixes #621
Fixes #646

What does this PR do?

  • Adds the required settings

    • with_embedders does use the same "API" (not using impl AsRef for items passed) as with_synonyms, as this is the closest existing
    • given set_embedders has not been implemented upstream (at least when I try to PATCH the object, it does not work)
    • only {get,reset}_embedders settings have been implemented.
      Said implementation goes with the work done in Implement vector search experimental feature v2 (v1.6) meilisearch-python#924
  • adds the hybrid field to search via the vector search to add an end-to-end test of this feature with the huggingface configuration.

    userProvided seens more brittle, but we may want change to this instead
    using userProvided instead would mean (at the cost of hardcoding stuff)
    => lower cpu effort
    => no higher timeout necceeseary
    => aligning with meilisearch/meilisearch to only have a test for userProvided)

TODO:

  • find a combination of semantic search model + configuration that does not fail the assumptions (see search testcase) spectacularly

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

Summary by CodeRabbit

  • New Features

    • Added support for hybrid semantic search, allowing users to combine keyword and semantic search with customizable parameters.
    • Introduced the ability to provide custom embedding vectors in search queries and retrieve vectors in search results.
    • Added comprehensive configuration options for semantic search embedders, supporting multiple providers (HuggingFace, OpenAI, Ollama, REST, and user-provided).
    • Enabled management of embedders through new settings and API methods, including fetching, setting, and resetting embedder configurations.
  • Documentation

    • Added detailed usage examples and documentation for new semantic search and embedder configuration features.
  • Tests

    • Introduced new tests to verify hybrid search, vector retrieval, and embedder management functionalities.

@CommanderStorm CommanderStorm changed the title Vector search embedder [v1.6] support embedders setting Mar 2, 2024
@CommanderStorm CommanderStorm marked this pull request as ready for review March 2, 2024 19:52
@CommanderStorm CommanderStorm force-pushed the vector-search-embedder branch from 1076241 to 8ffb555 Compare March 11, 2024 11:42
@curquiza
Copy link
Member

curquiza commented Apr 15, 2024

Hello @CommanderStorm

Can you rebase your branch? We made changes recently to improve the library

Sorry for the inconvenience, I try to review your PR as soon as possible after the rebase

@CommanderStorm CommanderStorm force-pushed the vector-search-embedder branch 2 times, most recently from 51820a9 to 8ade09d Compare April 16, 2024 03:48
@CommanderStorm
Copy link
Contributor Author

CommanderStorm commented Apr 16, 2024

Can you rebase your branch?

Done ^^

Copy link
Member

@irevoire irevoire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @CommanderStorm,

Code-wise, the PR is very nice and well-documented; I love it!


But I’m hitting way too many timeouts to really help you; sorry 😖

I don’t think we’ll be able to merge this PR with tests that take about 10 minutes. Could you mock Meilisearch as we did here:

async fn test_get_tasks_with_params() -> Result<(), Error> {
let mut s = mockito::Server::new_async().await;
let mock_server_url = s.url();
let client = Client::new(mock_server_url, Some("masterKey")).unwrap();
let path =
"/tasks?indexUids=movies,test&statuses=equeued&types=documentDeletion&uids=1&limit=0&from=1";
let mock_res = s.mock("GET", path).with_status(200).create_async().await;
let mut query = TasksSearchQuery::new(&client);
query
.with_index_uids(["movies", "test"])
.with_statuses(["equeued"])
.with_types(["documentDeletion"])
.with_from(1)
.with_limit(0)
.with_uids([&1]);
let _ = client.get_tasks_with(&query).await;
mock_res.assert_async().await;
Ok(())

That way, we simply ensure that meilisearch-rust sends a valid payload and hope meilisearch works as expected.

userProvided seens more brittle, but we may want change to this instead

Or we could do that and actually send the payload to meilisearch.
Or do both; let me know what you prefer, but I would very much like the tests to not take that much time to run 😖

Where in the Meilisearch codebase could I find an e2e-test how to use feature?

I don’t think there is. I believe you’re right, and we only wrote tests for user-provided vectors

introduces the experimental-vector-search-feature flag

I don’t know if this is required, @curquiza. Could we simply expose the feature as-is instead of hiding it behind a feature flag (that will make it harder to test and use).
Since it doesn’t add any dependency, I don’t see much point in putting it behind a feature flag

@curquiza
Copy link
Member

I don’t know if this is required, @curquiza. Could we simply expose the feature as-is instead of hiding it behind a feature flag (that will make it harder to test and use).

Yes, that's ok not to do feature flag 😊

@CommanderStorm CommanderStorm force-pushed the vector-search-embedder branch from 0285e62 to 9033232 Compare April 17, 2024 15:37
@CommanderStorm
Copy link
Contributor Author

CommanderStorm commented Apr 17, 2024

I did a few tests, and on my side it never took 120s, the quickest execution took 150s but most of the time I was over 200s

Time after being migrated to userProvided seems fine in CI (this is likely the slowest machine running the tests). ✅

Or we could do that and actually send the payload to meilisearch.

I think keeping the testcases from requiring a active internet connection is better as otherwise the test might be unnessesarily flaky in CI.

Could you mock Meilisearch as we did here

I can add a testcase where I mock the routes.
I am unsure if you actually would want to mock this for an experimental feature (= where the api might change => requiring changes)

I have tweaked a bunch with the vectors available and can't get the test_hybrid testcase to work without fully using the same dataset as in tests/search/hybrid.rs.

It seems to always return everything when I set semantic_ratio=1.0...
Is this operating as intended?

I have tried similar 2D vectors as the upstream test and went as far as to use 1D vectors with considerable (1/10/1k/1m) spread (=>something that as far as I understand embeddings should not match)

=> I am missing something. 😅
Could you point me into the right direction?

Note

I can obviously steal the testcase from tests/search/hybrid.rs, but for longer-term maintainability I would like this testcase to not be such an oddball compared to the existing testcases in this repo ^^

@CommanderStorm CommanderStorm force-pushed the vector-search-embedder branch from 9033232 to 0e044b1 Compare April 17, 2024 16:18
@curquiza curquiza requested a review from irevoire May 14, 2024 09:25
@curquiza
Copy link
Member

curquiza commented Jul 1, 2024

@CommanderStorm really sorry for this.
Can you fix the git conflicts? 😊

@CommanderStorm
Copy link
Contributor Author

Thanks for the pinging (github does not give me notifications for this) ^^
No need to worry. If this takes a few months that is fine.

Copy link
Member

@curquiza curquiza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CommanderStorm thank you again! Can you add the rest and ollama models we added in v1.8.0?
Sorry for the late notice again!
And thank you for your involvement 🙏

@NoodleSamaChan NoodleSamaChan mentioned this pull request Jul 2, 2024
3 tasks
@irevoire
Copy link
Member

irevoire commented Jul 3, 2024

Hey @CommanderStorm, since we introduced a new rest embedder, I think we could write better tests by mocking the rest embedder and ensuring it works with meilisearch; here's some example I wrote earlier this week in meilisearch: meilisearch/meilisearch@2141cb3

@Akagi201
Copy link

Any update to this PR? It's very useful for AI search

@CommanderStorm
Copy link
Contributor Author

From a features/tests/documentation POV nothing is blocking merging.
I suspect that meilisearch noticed that the other integrations are much more used and is allocating reviewer capacity accordingly.

In terms of docs, my last sync with their docs/features was two months ago. If there is drift, feel free to provide a PR to this PR or copy my changes into a new PR.
At this point, I have abandoned the idea of AI search due the increased latency (I don't run on fast hardware) being a noticeably worse experience for my use-case.
Since said functionality is stable now, I am leaving this open. If the team wants some change, I think I can do that too, but only if there is a hope of this leading to merging, which at the moment does not look promising.

You can use this via overriding what meilisearch means

- meilisearch-sdk = "1"
+ meilisearch-sdk = { git = "https://github.com/commanderstorm/meilisearch-rust.git", branch = "vector-search-embedder" }

@CommanderStorm CommanderStorm changed the title Support embedders setting Support embedders setting and other vector/hybrid search related configuration May 8, 2025
@awyl
Copy link

awyl commented May 15, 2025

Hi,

I ran into this error when I try to create an index with settings provided in this PR.

Meilisearch: 1.14

Settings::new()
    .with_embedders(HashMap::from([(
        "default",
        Embedder::OpenAI(OpenAIEmbedderSettings {
            api_key: "...",
            model: Some("text-embedding-3-small".to_string()),
            dimensions: Some(1536),
            document_template_max_bytes: Some(6000),
            ..Default::default()
        }),
    )]))

Error:

Meilisearch invalid_request: unknown: Unknown value `openAI` at `.embedders.default.source`: did you mean 
`openAi`? expected one of `openAi`, `huggingFace`, `ollama`, `userProvided`, `rest`, `composite`. https://docs.me
ilisearch.com/errors#invalid_settings_embedders

Copy link

coderabbitai bot commented May 19, 2025

Walkthrough

The changes introduce comprehensive support for AI-powered and hybrid semantic search in the SDK. This includes new settings and API methods for configuring multiple embedders with provider-specific options, extended search query capabilities for hybrid and vector-based search, and enhanced test coverage for these features. All updates are additive and maintain existing functionality.

Changes

File(s) Change Summary
src/search.rs Added support for hybrid semantic search in the search query API. Introduced HybridSearch struct and extended SearchQuery with hybrid, vector, and retrieve_vectors fields. Added builder methods for these fields. Enhanced tests for vector retrieval and hybrid search, including new vector-related types and test data.
src/settings.rs Introduced comprehensive embedder configuration support through a new Embedder enum and associated structs for HuggingFace, OpenAI, Ollama, REST, and user-provided embedders. Extended Settings with an embedders field and builder method. Added async methods on Index to get and reset embedders. Added integration tests for embedder management and detailed documentation.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant SDK
    participant Meilisearch

    Client->>SDK: Configure Settings with Embedders
    SDK->>Meilisearch: PATCH /indexes/:uid/settings (with embedders)
    Meilisearch-->>SDK: TaskInfo

    Client->>SDK: Build SearchQuery (with hybrid/vector/retrieve_vectors)
    SDK->>Meilisearch: POST /indexes/:uid/search (with hybrid and/or vector params)
    Meilisearch-->>SDK: Search results (optionally with vectors)
    SDK-->>Client: Search results
Loading

Assessment against linked issues

Objective Addressed Explanation
Support embedders setting in SDK, including builder and JSON structure (#541, #646)
Implement embedder variants and all subfields, including new/changed fields for REST, OpenAI, Ollama, etc. (#612, #646)
Add methods to get, set, and reset embedders via SDK methods (#646)
Add hybrid search support: hybrid param with embedder (mandatory), semanticRatio, vector, retrieveVectors (#621, #646)
Remove deprecated REST embedder fields, add new ones (request, response, headers), update tests (#612)

Poem

In the warren of search, where queries hop,
Embedders and vectors now leap to the top!
Hybrid and semantic, all settings in tow,
With Ollama and OpenAI, see results grow.
🥕 The SDK’s garden is richer today—
Hooray for the code that leads the AI way!

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 5d488a3 and b04d320.

📒 Files selected for processing (1)
  • src/settings.rs (13 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/settings.rs

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (2)
src/settings.rs (2)

3083-3090: ⚠️ Potential issue

Copy-paste error: test validates the wrong endpoint

test_reset_separator_tokens resets the separator tokens but then calls
get_dictionary, so it never checks whether the reset actually worked.
The test will pass even if separator-token APIs are broken.

-        let res = index.get_dictionary().await.unwrap();
-        assert_eq!(separator, res);
+        let res = index.get_separator_tokens().await.unwrap();
+        assert_eq!(separator, res);

Please adjust the assertion to target the correct getter (same problem exists a few lines below for non-separator tokens).


3113-3120: ⚠️ Potential issue

Same mismatch in non-separator-token reset test

The test resets non-separator tokens but queries the dictionary.

-        let res = index.get_dictionary().await.unwrap();
+        let res = index.get_non_separator_tokens().await.unwrap();

Without this change, the test gives a false sense of security.

♻️ Duplicate comments (1)
src/search.rs (1)

639-655: Leverage the new HybridSearch::new & clean up with_vector

-    pub fn with_hybrid<'b>(
-        &'b mut self,
-        embedder: &'a str,
-        semantic_ratio: f32,
-    ) -> &'b mut SearchQuery<'a, Http> {
-        self.hybrid = Some(HybridSearch {
-            embedder,
-            semantic_ratio,
-        });
+    pub fn with_hybrid<'b>(
+        &'b mut self,
+        embedder: &'a str,
+        semantic_ratio: f32,
+    ) -> &'b mut SearchQuery<'a, Http> {
+        self.hybrid = Some(
+            HybridSearch::new(embedder, semantic_ratio)
+                .expect("semantic_ratio must be within [0.0, 1.0]"),
+        );
         self
     }
-    /// Defines what vectors an userprovided embedder has gotten for semantic searching
-    pub fn with_vector<'b>(&'b mut self, vector: &'a [f32]) -> &'b mut SearchQuery<'a, Http> {
-        self.vector = Some(vector);
-        self
-    }
+
+    // new `with_vector` added in previous comment

This compiles the validation logic into the API surface and removes the lifetime trap.

🧹 Nitpick comments (2)
src/settings.rs (2)

193-205: OllamaEmbedderSettings derives Default but has a required field

model: String is documented as mandatory, yet deriving Default silently
sets it to an empty string. That encourages users to write ..Default::default() and ship an invalid configuration that will only fail at runtime.

Consider one of:

-#[derive(Serialize, Deserialize, Default, Debug, Clone, Eq, PartialEq)]
+#[derive(Serialize, Deserialize, Debug, Clone, Eq, PartialEq)]

or implement a custom Default that provides a sensible model name.
This keeps the type API honest.


281-288: Nit: misleading environment-variable reference

The docs say that, if url is not set, Meilisearch will read MEILI_OLLAMA_URL.
For the generic REST embedder this is confusing and probably inaccurate (left-over from copy-pasting the Ollama section).

Update the comment or remove the sentence to avoid misleading users.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 90a153c and 1c5c9f6.

📒 Files selected for processing (2)
  • src/search.rs (17 hunks)
  • src/settings.rs (13 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: integration-tests

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/settings.rs (1)

46-49: ⚠️ Potential issue

Fix OpenAI enum variant serialization to match upstream API

Users have reported errors when creating indices with embedders because the case in "openAI" doesn't match what Meilisearch expects ("openAi" with lowercase 'i'). While renaming the variant to OpenAi helps, using an explicit rename attribute would make the solution more obvious and maintainable.

    /// Compute embeddings inside meilisearch with models from [HuggingFace](https://huggingface.co/).
    /// You may be able to significantly improve performance by [compiling a CUDA-compatible Meilisearch binary](https://www.meilisearch.com/docs/guides/ai/computing_hugging_face_embeddings_gpu).
    /// This is a resource-intensive operation and might affect indexing performance negatively.
    HuggingFace(HuggingFaceEmbedderSettings),
    /// Use OpenAI's API to generate embeddings
    /// Depending on hardware, this is a
-    OpenAi(OpenAIEmbedderSettings),
+    /// 
+    #[serde(rename = "openAi")]
+    OpenAI(OpenAIEmbedderSettings),
🧹 Nitpick comments (5)
src/settings.rs (5)

48-49: Incomplete documentation comment

The documentation for OpenAI embedder has an incomplete sentence: "Depending on hardware, this is a". Please complete or remove this sentence to maintain high documentation quality.

/// Use OpenAI's API to generate embeddings
-/// Depending on hardware, this is a
+/// Depending on hardware, this can be more efficient than computing embeddings locally.

131-133: Naming inconsistency between enum variant and struct

The enum variant is named OpenAi but the struct is named OpenAIEmbedderSettings. This inconsistency makes the code harder to understand and maintain. Consider standardizing on one naming convention.

-#[derive(Serialize, Deserialize, Default, Debug, Clone, Eq, PartialEq)]
-#[serde(rename_all = "camelCase")]
-pub struct OpenAIEmbedderSettings {
+#[derive(Serialize, Deserialize, Default, Debug, Clone, Eq, PartialEq)]
+#[serde(rename_all = "camelCase")]
+pub struct OpenAiEmbedderSettings {

Also update all references to this struct throughout the file.


285-286: Documentation error - incorrect environment variable

The documentation for the GenericRestEmbedderSettings mentions MEILI_OLLAMA_URL which appears to be a copy-paste error from the OllamaEmbedderSettings section.

    /// Must be parseable as a URL.
-    /// If not specified, [Meilisearch](https://www.meilisearch.com/) (**not the sdk you are currently using**) will try to fetch the `MEILI_OLLAMA_URL` environment variable
+    /// If not specified, [Meilisearch](https://www.meilisearch.com/) (**not the sdk you are currently using**) will try to fetch the appropriate environment variable
    /// Example: `"http://localhost:12345/api/v1/embed"`

330-334: Incorrect documentation alignment

The documentation comment appears to be misaligned with the field it describes. The comment about maximum template size should be on the document_template_max_bytes field rather than interrupting the request field documentation.

    /// ```
-    /// The maximum size of a rendered document template.
-    //
-    // Longer texts are truncated to fit the configured limit.
-    /// Default: `400`
    #[serde(skip_serializing_if = "Option::is_none")]
    pub document_template_max_bytes: Option<usize>,
+    /// The maximum size of a rendered document template.
+    ///
+    /// Longer texts are truncated to fit the configured limit.
+    /// Default: `400`

2995-2996: Suggest additional test coverage for other embedder types

While the basic test is good, consider adding additional tests for the other embedder types to ensure they serialize and deserialize correctly. This would help catch issues like the OpenAI/OpenAi capitalization problem earlier.

#[meilisearch_test]
async fn test_set_huggingface_embedder_settings(client: Client, index: Index) {
    let hf_embedder = Embedder::HuggingFace(HuggingFaceEmbedderSettings {
        model: Some("BAAI/bge-base-en-v1.5".to_string()),
        document_template: Some("A document titled {{doc.title}}".to_string()),
        ..Default::default()
    });
    let embeddings = HashMap::from([("hf".into(), hf_embedder)]);
    let settings = Settings::new().with_embedders(embeddings.clone());

    let task_info = index.set_settings(&settings).await.unwrap();
    client.wait_for_task(task_info, None, None).await.unwrap();

    let res = index.get_embedders().await.unwrap();

    assert_eq!(embeddings, res);
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 1c5c9f6 and 430c8d1.

📒 Files selected for processing (1)
  • src/settings.rs (13 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: integration-tests
🔇 Additional comments (6)
src/settings.rs (6)

1214-1227: Proper default handling in get_embedders method

The implementation of get_embedders correctly handles the case where the server returns None by providing an empty HashMap. However, making this more explicit in the documentation would help users understand this behavior.

Consider adding a documentation comment explaining that this method will never return None, but instead return an empty HashMap if no embedders are configured.


2995-3006: Well-structured test for embedder settings

The test for setting embedder settings is well-structured and provides good coverage of the basic functionality. It creates a simple user-provided embedder, sets it using with_embedders, and then verifies that get_embedders returns the expected configuration.


655-670: Well-implemented builder method for embedders

The with_embedders method follows the same pattern as other builder methods in the SDK, ensuring consistency for users. It correctly handles the mapping of string keys and preserves the rest of the settings.


1190-1213: Good documentation with complete example for get_embedders

The documentation and example for the get_embedders method provides a clear, complete example that demonstrates how to use the method, including setting up an embedder first to make the example more realistic.


2575-2606: Good documentation with example for reset_embedders

The reset_embedders method has clear documentation and a practical example that follows the same pattern as other reset methods in the SDK, maintaining consistency.


39-58: Well-structured Embedder enum with comprehensive documentation

The Embedder enum is well-structured with appropriate serde tags for serialization/deserialization. The documentation for each variant clearly explains its purpose and any performance considerations.

@CommanderStorm
Copy link
Contributor Author

@awyl could you retry please? Should be fixed now

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/settings.rs (1)

331-337: Documentation comment format issue

The documentation for document_template_max_bytes mixes both /// and // comment styles, which can cause inconsistencies in generated documentation.

Use consistent /// style for all documentation comments:

/// The maximum size of a rendered document template.
-//
-// Longer texts are truncated to fit the configured limit.
+///
+/// Longer texts are truncated to fit the configured limit.
/// Default: `400`
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 4a175e0 and 5d488a3.

📒 Files selected for processing (1)
  • src/settings.rs (13 hunks)
🔇 Additional comments (12)
src/settings.rs (12)

39-60: Enum variant type matches Meilisearch API requirements

The Embedder enum correctly implements the tagged enum pattern with tag = "source" for proper serialization. Based on past review comments, the variant OpenAi with lowercase 'i' (rather than OpenAI) matches what the Meilisearch API expects, preventing deserialization errors.


76-115: Well-structured settings for HuggingFace embedders

The implementation provides comprehensive documentation and proper serialization. All fields are optional with appropriate skip_serializing_if attributes, following the codebase's established patterns.


133-176: Well-structured settings for OpenAI embedders

The implementation correctly requires an API key for OpenAI while making other fields optional. Documentation is thorough with examples and default values.


194-243: Appropriate settings for Ollama embedders

The model field is properly required while url and api_key are optional, which aligns with Ollama's configuration requirements.


456-458: Embedders field added to Settings struct

The embedders field is appropriately optional and uses a HashMap<String, Embedder> to support multiple named embedders.


656-671: Builder method for embedders follows established patterns

The with_embedders method follows the same pattern as other builder methods in the struct, allowing for a fluent interface while handling string-like keys.


1191-1228: Get embedders method with proper default handling

The get_embedders method follows the established pattern in the SDK. It correctly handles the case when no embedders are configured by returning an empty map rather than None.


2576-2607: Reset embedders method implementation

The implementation of reset_embedders follows the same pattern as other reset methods in the codebase, using the DELETE HTTP method and expecting a 202 response.


2775-2780: Test for retrieving embedders

The test confirms that get_embedders correctly returns an empty HashMap when no embedders are configured.


2808-2816: Test for resetting embedders

The test properly verifies the functionality of reset_embedders by checking that the API returns an empty HashMap after reset.


2994-3007: Comprehensive test for setting embedder settings

The test demonstrates setting a UserProvided embedder via the Settings struct and verifies that the changes are correctly applied.


1-3163:

✅ Verification successful

Implementation note: Missing dedicated set_embedders method

While you can set embedders through the general set_settings method, there's no dedicated set_embedders method like there is for other settings. According to the PR objectives, this was intentional since "set_embedders was not functional upstream at the time of development."

When the upstream API supports this functionality, you might want to add a dedicated method for consistency with the rest of the API.

You should verify that the upstream Meilisearch API now properly supports setting embedders directly before adding a dedicated method:


🏁 Script executed:

#!/bin/bash
# Check the current Meilisearch docs for embedder PATCH support
curl -s "https://www.meilisearch.com/docs/api/settings/embedders" | grep -o "PATCH" || echo "PATCH not found in embedders documentation"

Length of output: 176


Upstream API Does Not Support Direct embedders PATCH

Verified the Meilisearch documentation at https://www.meilisearch.com/docs/api/settings/embedders – there is no PATCH method listed for embedders. Until the upstream API adds a dedicated endpoint to update embedders, using the general set_settings is correct and no dedicated set_embedders method is needed.

@awyl
Copy link

awyl commented May 19, 2025

@awyl could you retry please? Should be fixed now

Thank you, it works perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
5 participants