Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.13: Add remote federated search documentation #3144

Merged
merged 14 commits into from
Feb 13, 2025
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions .code-samples.meilisearch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1444,3 +1444,46 @@ experimental_post_logs_stream_1: |-
experimental_delete_logs_stream_1: |-
curl \
-X DELETE MEILISEARCH_URL/logs/stream
multi_search_remote_federated_1: |-
curl \
-X POST 'MEILISEARCH_URL/multi-search' \
-H 'Content-Type: application/json' \
--data-binary '{
"federation": {},
"queries": [
{
"indexUid": "movies",
"q": "batman",
"federationOptions": {
"remote": "ms-00"
}
},
{
"indexUid": "movies",
"q": "batman",
"federationOptions": {
"remote": "ms-01"
}
}
]
}'
get_network_1: |-
curl \
-X GET 'MEILISEARCH_URL/network'
update_network_1: |-
curl \
-X PATCH 'MEILISEARCH_URL/network' \
-H 'Content-Type: application/json' \
--data-binary '{
"self": "ms-00",
"remotes": {
"ms-00": {
"url": "http://INSTANCE_URL",
"searchApiKey": "INSTANCE_API_KEY"
},
"ms-01": {
"url": "http://ANOTHER_INSTANCE_URL",
"searchApiKey": "ANOTHER_INSTANCE_API_KEY"
}
}
}'
5 changes: 5 additions & 0 deletions config/sidebar-learn.json
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,11 @@
"label": "Using multi-search to perform a federated search",
"slug": "performing_federated_search"
},
{
"source": "learn/multi_search/implement_sharding.mdx",
"label": "Implement sharding with remote federated search",
"slug": "implement_sharding"
},
{
"source": "learn/multi_search/multi_search_vs_federated_search.mdx",
"label": "Differences between multi-search and federated search",
Expand Down
5 changes: 5 additions & 0 deletions config/sidebar-reference.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@
"label": "Multi-search",
"slug": "multi_search"
},
{
"source": "reference/api/network.mdx",
"label": "Network",
"slug": "network"
},
{
"source": "reference/api/similar.mdx",
"label": "Similar documents",
Expand Down
127 changes: 127 additions & 0 deletions learn/multi_search/implement_sharding.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
---
title: Implement sharding with remote federated search — Meilisearch documentation
description: This guide walks you through implementing a sharding strategy by activating the `/network` route, configuring the network object, and performing remote federated searches.
---

# Implement sharding with remote federated search <NoticeTag type="experimental" label="experimental" />

Sharding is the process of splitting an index containing many documents into multiple smaller indexes, often called shards. This horizontal scaling technique is useful when handling large databases. In Meilisearch, the best way to implement a sharding strategy is to use remote federated search.

If you are an enterprise Meilisearch Cloud customer, sharding costs and configuration are already included in your plan. Contact support to enable this feature in your projects.

For other Cloud customers and self-hosted users, this guide walks you through activating the `/network` route, configuring the network object, and performing remote federated searches.

<Capsule intent="tip" title="Configuring multiple instances">
To minimize issues and limit unexpected behavior, instance, network, and index configuration should be identical for all shards. This guide describes the individual steps you must take on a single instance and assumes you will replicate them across all instances.
</Capsule>

## Prerequisites

- Multiple Meilisearch projects (instances) running Meilisearch >=v1.13

## Activate the `/network` endpoint

First, activate the endpoint in the Cloud interface by navigating to your project dashboard and ticking the "Remote federated search requests" box.

Alternatively, use the `/experimental-features` route to enable `network`:

```sh
curl \
-X PATCH 'MEILISEARCH_URL/experimental-features/' \
-H 'Content-Type: application/json' \
--data-binary '{
"network": true
}'
```

Meilisearch should respond immediately, confirming the route is now accessible. Repeat this process for all instances.

## Configuring the network object

Next, you must configure the network object. It consists of the following fields:

- `remotes`: defines a list with the required information to access each remote instance
- `self`: specifies which of the configured `remotes` corresponds to the current instance

### Setting up the list of remotes

Use the `/network` route to configure the `remotes` field of the network object. `remotes` should be an object containing one or more objects. Each one of the nested objects should consist of the name of each instance, associated with its URL and an API key with search permission:

```sh
curl \
-X PATCH 'MEILISEARCH_URL/network' \
-H 'Content-Type: application/json' \
--data-binary '{
"remotes": {
"REMOTE_NAME_1": {
"url": "INSTANCE_URL_1",
"searchApiKey": "SEARCH_API_KEY_1"
},
"REMOTE_NAME_2": {
"url": "INSTANCE_URL_2",
"searchApiKey": "SEARCH_API_KEY_2"
},
"REMOTE_NAME_3": {
"url": "INSTANCE_URL_3",
"searchApiKey": "SEARCH_API_KEY_3"
},
}
}'
```

Configure the entire set of remote instances in your sharded database, making sure to send the same remotes to each instance.

### Specify the name of the current instance

Now all instances share the same list of remotes, set the `self` field to specify which of the remotes corresponds to the current instance:

```sh
curl \
-X PATCH 'MEILISEARCH_URL/network' \
-H 'Content-Type: application/json' \
--data-binary '{
"self": "REMOTE_NAME_1"
}'
```

Meilisearch processes searches on the remote that corresponds to `self` locally instead of making a remote request.

### Adding or removing an instance

Changing the topology of the network involves moving some documents from an instance to another, depending on your hashing scheme.

As Meilisearch does not provide atomicity across multiple instances, you will need to either:

1. accept search downtime while migrating documents
2. accept some documents will not appear in search results during the migration
3. accept some duplicate documents may appear in search results during the migration

#### Reducing downtime

If your disk space allows, you can reduce the downtime by applying the following algorithm:

1. Create a new temporary index in each remote instance
2. Compute the new instance for each document
3. Send the documents to the temporary index of their new instance
4. Once Meilisearch has copied all documents to their instance of destination, swap the new index with the previously used index
5. Delete the temporary index after the swap
6. Update network configuration and search queries across all instances

## Create indexes and add documents

Create the same empty indexes with the same settings on all instances. Keeping the settings and indexes in sync is important to avoid errors and unexpected behavior, though not strictly required.

Distribute your documents across all instances. Do not send the same document to multiple instances as this may lead to duplicate search results. Similarly, you should ensure all future versions of a document are sent to the same instance. Meilisearch recommends you hash their primary key using [rendezvous hashing](https://en.wikipedia.org/wiki/Rendezvous_hashing).

### Updating index settings

Changing settings in a sharded database is not fundamentally different from changing settings on a single Meilisearch instance. If the update enables a feature, such as setting filterable attributes, wait until all changes have been processed before using the `filter` search parameter in a query. Likewise, if an update disables a feature, first remove it from your search requests, then update your settings.

## Perform a search

Send your federated search request containing one query per instance:

<CodeSamples id="multi_search_remote_federated_1" />

If all instances share the same network configuration, you can send the search request to any instance. Having `"remote": "ms-00"` appear in the list of queries on the instance of that name will not cause an actual proxy search thanks to `network.self`.
2 changes: 2 additions & 0 deletions reference/api/keys.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,8 @@ For security reasons, we do not recommend creating keys that can perform all act
| **`keys.create`** | Provides access to the [create key](#create-a-key) endpoint |
| **`keys.update`** | Provides access to the [update key](#update-a-key) endpoint |
| **`keys.delete`** | Provides access to the [delete key](#delete-a-key) endpoint |
| **`network.get`** | Provides access to the [get the network object](/reference/api/network#get-the-network-object) endpoint |
| **`network.update`** | Provides access to the [update the network object](/reference/api/network#update-the-network-object) endpoint |

### `indexes`

Expand Down
80 changes: 66 additions & 14 deletions reference/api/multi_search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -139,14 +139,14 @@ There is no way to specify that two documents should be treated as the same acro

| Search parameter | Type | Default value | Description |
| :--------------------------------------------------------------------------- | :--------------- | :------------ | :-------------------------------------------------- |
| **[`federationOptions`](#federationoptions)** | Object | `null` | Configure federation settings for a specific query |
| **[`indexUid`](/learn/getting_started/indexes#index-uid)** | String | N/A | `uid` of the requested index |
| **[`federationOptions`](#federationoptions)** | Object | `null` | Configure federation settings for a specific query |
| **[`indexUid`](/learn/getting_started/indexes#index-uid)** | String | N/A | `uid` of the requested index |
| **[`q`](/reference/api/search#query-q)** | String | `""` | Query string |
| **[`offset`](/reference/api/search#offset)** | Integer | `0` | Number of documents to skip |
| **[`limit`](/reference/api/search#limit)** | Integer | `20` | Maximum number of documents returned |
| **[`hitsPerPage`](/reference/api/search#number-of-results-per-page)** | Integer | `1` | Maximum number of documents returned for a page |
| **[`page`](/reference/api/search#page)** | Integer | `1` | Request a specific page of results |
| **[`filter`](/reference/api/search#filter)** | [String](/learn/filtering_and_sorting/filter_expression_reference) | `null` | Filter queries by an attribute's value |
| **[`filter`](/reference/api/search#filter)** | String | `null` | Filter queries by an attribute's value |
| **[`facets`](/reference/api/search#facets)** | Array of strings | `null` | Display the count of matches per facet |
| **[`attributesToRetrieve`](/reference/api/search#attributes-to-retrieve)** | Array of strings | `["*"]` | Attributes to display in the returned documents |
| **[`attributesToCrop`](/reference/api/search#attributes-to-crop)** | Array of strings | `null` | Attributes whose values have to be cropped |
Expand All @@ -172,10 +172,11 @@ These options are not compatible with federated searches.
`federationOptions` must be an object. It accepts the following parameters:

- `weight`: serves as a multiplicative factor to ranking scores of search results in this specific query. If < `1.0`, the hits from this query are less likely to appear in the final results list. If > `1.0`, the hits from this query are more likely to appear in the final results list. Must be a positive floating-point number. Defaults to `1.0`
- `remote` <NoticeTag type="experimental" label="experimental" />: indicates the remote instance where Meilisearch will perform the query. Must be a string corresponding to a [remote object](/reference/api/network). Defaults to `null`

### Response

The response to `/multi-search` queries may take two shapes: federated and non-federated.
The response to `/multi-search` queries may take different shapes depending on the type of query you're making.

#### Non-federated multi-search requests

Expand All @@ -187,7 +188,7 @@ Each search result object is composed of the following fields:

| Name | Type | Description |
| :----------------------- | :--------------- | :------------------------------------------------------------------------------- |
| **`indexUid`** | String | [`uid`](/learn/getting_started/indexes#index-uid) of the requested index |
| **`indexUid`** | String | [`uid`](/learn/getting_started/indexes#index-uid) of the requested index |
| **`hits`** | Array of objects | Results of the query |
| **`offset`** | Number | Number of documents skipped |
| **`limit`** | Number | Number of documents to take |
Expand All @@ -212,24 +213,27 @@ Federated search requests return a single object and the following fields:
| **`limit`** | Number | Number of documents to take |
| **`estimatedTotalHits`** | Number | Estimated total number of matches |
| **`processingTimeMs`** | Number | Processing time of the query |
| **`facetsByIndex`** | Object | [Data for facets present in the search results](#facetsbyindex) |
| **`facetDistribution`** | Object | [Distribution of the given facets](#mergefacets) |
| **`facetStats`** | Object | [The numeric `min` and `max` values per facet](#mergefacets) |
| **`facetsByIndex`** | Object | [Data for facets present in the search results](#facetsbyindex) |
| **`facetDistribution`** | Object | [Distribution of the given facets](#mergefacets) |
| **`facetStats`** | Object | [The numeric `min` and `max` values per facet](#mergefacets) |
| **`remoteErrors`** | Object | Indicates which remote requests failed and why |

Each result in the `hits` array contains an additional `_federation` field with the following fields:

| Name | Type | Description |
| :----------------------- | :--------------- | :------------------------------------------------------------------------------- |
| **`indexUid`** | String | Index of origin for this document |
| **`queriesPosition`** | Number | Array index number of the query in the request's `queries` array |
| Name | Type | Description |
| :-------------------------- | :--------------- | :--------------------------------------------------------------------------------- |
| **`indexUid`** | String | Index of origin for this document |
| **`queriesPosition`** | Number | Array index number of the query in the request's `queries` array |
| **`remote`** | String | Remote instance of origin for this document
| **`weightedRankingScore`** | Number | The product of the _rankingScore of the hit and the weight of the query of origin. |

### Example

#### Non-federated multi-search

<CodeSamples id="multi_search_1" />

#### Response: `200 Ok`
##### Response: `200 Ok`

```json
{
Expand Down Expand Up @@ -289,7 +293,7 @@ Each result in the `hits` array contains an additional `_federation` field with

<CodeSamples id="multi_search_federated_1" />

#### Response: `200 Ok`
##### Response: `200 Ok`

```json
{
Expand Down Expand Up @@ -321,3 +325,51 @@ Each result in the `hits` array contains an additional `_federation` field with
"semanticHitCount": 0
}
```

#### Remote federated multi-search <NoticeTag type="experimental" label="experimental" />

<CodeSamples id="multi_search_remote_federated_1" />

##### Response: `200 Ok`

```json
{
"hits": [
{
"id": 42,
"title": "Batman returns",
"overview": …,
"_federation": {
"indexUid": "movies",
"queriesPosition": 0,
"weightedRankingScore": 1.0,
"remote": "ms-01"
}
},
{
"id": 87,
"description": …,
"title": "Batman: the killing joke",
"_federation": {
"indexUid": "movies",
"queriesPosition": 1,
"weightedRankingScore": 0.9848484848484849,
"remote": "ms-00"
}
},
],
"processingTimeMs": 35,
"limit": 5,
"offset": 0,
"estimatedTotalHits": 111,
"remoteErrors": {
"ms-02": {
"message": "error sending request",
"code": "proxy_could_not_send_request",
"type": "system",
"link": "https://docs.meilisearch.com/errors#proxy_could_not_make_request"
}
}
}
```
Loading