Skip to content

Latest commit

 

History

History
65 lines (51 loc) · 2.84 KB

35_Replica_shards.asciidoc

File metadata and controls

65 lines (51 loc) · 2.84 KB

Replica shards

Up until now we have spoken only about primary shards, but we have another tool in our belt: replica shards. The main purpose of replicas is for failover, as discussed in [distributed-cluster]: if the node holding a primary shard dies, then a replica is promoted to the role of primary.

At index time, a replica shard does the same amount of work as the primary shard. New documents are first indexed on the primary and then on any replicas. Increasing the number of replicas does not change the capacity of the index.

However, replica shards can serve read requests. If, as is often the case, your index is search-heavy, you can increase search performance by increasing the number of replicas, but only if you also add extra hardware.

Let’s return to our example of an index with two primary shards. We increased capacity of the index by adding a second node. Adding more nodes would not help us to add indexing capacity, but we could take advantage of the extra hardware at search time by increasing the number of replicas:

POST /my_index/_settings
{
  "number_of_replicas": 1
}

Having two primary shards, plus a replica of each primary, would give us a total of four shards: one for each node.

An index with two primary shards and one replica can scale out across four nodes
Figure 1. An index with two primary shards and one replica can scale out across four nodes

Balancing load with replicas

Search performance depends on the response times of the slowest node, so it is a good idea to try to balance out the load across all nodes. If we had added just one extra node instead of two, we would end up with two nodes having one shard each, and one node doing double the work with two shards.

We can even things out by adjusting the number of replicas. By allocating two replicas instead of one, we end up with a total of 6 shards, which can be evenly divided between 3 nodes:

POST /my_index/_settings
{
  "number_of_replicas": 2
}

As a bonus, we have also increased our availability. We can now afford to lose two nodes and still have a copy of all of our data.

Adjust the number of replicas to balance the load between nodes
Figure 2. Adjust the number of replicas to balance the load between nodes
Note
The fact that node 3 holds two replicas and no primaries is not important. Replicas and primaries do the same amount of work, they just play slightly different roles. There is no need to ensure that primaries are distributed evenly across all nodes.