-
Notifications
You must be signed in to change notification settings - Fork 68
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
RFC: Improve the Scalability of TSO Service (#78)
- Loading branch information
Showing
5 changed files
with
97 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,6 @@ | ||
# Node.js | ||
node_modules/ | ||
package-lock.json | ||
|
||
# Others | ||
.DS_Store |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
# Improve the Scalability of TSO Service | ||
|
||
- RFC PR: https://github.com/tikv/rfcs/pull/78 | ||
- Tracking Issue: https://github.com/tikv/pd/issues/3149 | ||
|
||
## Summary | ||
|
||
As the size of the cluster grows, some users encountered a situation that is | ||
extremely challenging for TSO performance. In this kind of cluster that may | ||
have many TiDB instances, a certain number of PD clients will request TSO to | ||
the PD leader concurrently, and this puts a lot of CPU pressure on the PD leader | ||
because of the Go Runtime scheduling caused by the gRPC connection switch. With | ||
the high CPU pressure, TSO requests will suffer from increasing average latency | ||
and long-tail latency, which will hurt TiDB's QPS performance. To improve the | ||
scalability of the TSO Service, we will propose two kinds of enhancement to both TSO | ||
client and server in this RFC to reduce the PD CPU usage and improve QPS performance. | ||
|
||
## Motivation | ||
|
||
As mentioned before, improvement is needed in the cluster that has a certain number | ||
of TiDB instances, i.e PD clients, to reduce the CPU pressure of PD leader. With | ||
better TSO handling performance, in this case, TSO service won't be the potential | ||
bottleneck of a big cluster easily and have better scalability. | ||
|
||
## Detailed design | ||
|
||
### Current TSO processing flow | ||
|
||
Before proposing specific improvements, let me give a brief overview of the current | ||
PD client and TSO processing flow first. | ||
|
||
For the PD client, every time it receive a TSO request from the upper level, it won't | ||
send the TSO gRPC request to the PD leader immediately. Instead, it will collect as | ||
many TSO requests as it can from a channel and batch them as a single request to | ||
send to reduce the gRPC request number it actually sent. For example, at a certain | ||
moment, PD client may receive 10 TSO requests at once, it will send just one TSO gRPC | ||
request to the PD leader with an argument `count` inside the gRPC request body. | ||
|
||
![TSO Client Batch - Old](../media/tso-client-batch-old.png) | ||
|
||
For the PD server, it just takes the request and returns an incremental and unique TSO | ||
request with 4 counts and return it to the client. After the PD client receives the | ||
response, it will split the single TSO request into 10 different TSOs and return it to | ||
the upper requester. | ||
|
||
During the whole process flow, every PD client will maintain a gRPC stream with the PD | ||
leader to send and receive the TSO request efficiently. | ||
|
||
### Enhancement #1: Improve the batch effect | ||
|
||
An intuitive improvement is to improve the batch effect of TSO, i.e. to reduce the number | ||
of TSO gRPC requests by increasing the size of each batch with the same number of TSO requests. | ||
With fewer TSO gRPC requests, the CPU pressure on the PD leader can be reduced directly. | ||
|
||
We can introduce different batch strategies such as waiting for a while to fetch more TSO requests | ||
in a same interval or even more complicated one like a dynamic smallest batch size. | ||
|
||
![TSO Client Batch - New](../media/tso-client-batch-new.png) | ||
|
||
As the figure above shows, we can introduce a session variable like `@@tidb_tso_client_batch_max_wait_time` | ||
to control the max wait time the PD client is willing to wait for more TSO requests, which could make the | ||
batch size bigger without hurting the latency. | ||
|
||
Predicting strategy is also a useful way to improve the batch effect. For example, the PD Client | ||
could collect as much information as it needs such as latency and batch size in the last few minutes, | ||
then base on these information, the PD client could calculate a suitable expected batch size to predict | ||
the incoming TSO number, which make the batch waiting more effective. | ||
|
||
### Enhancement #2: Use proxy to reduce the stream number | ||
|
||
However, as mentioned before, according to our pprof result, the main reason of high PD leader | ||
CPU usage is the Go Runtime scheduling caused by the gRPC connection switch. Batch effect | ||
improvement does not alleviate this part at the root. To reduce the connection switch, | ||
fewer gRPC stream number is necessary. We can introduce the TSO Follower Proxy feature | ||
to achieve this. | ||
|
||
Every TSO request will be sent to different TSO servers (including both the PD leader and follower) | ||
randomly by the PD client, and multiple TSO requests sent to the same TSO follower will be | ||
batched again with the same logic as a PD client before being forwarded to the PD leader. | ||
With this implementation, the gRPC pressure is distributed to each PD server, and for the | ||
PD leader, if we have 50 TiDB instances and 5 PD instances, it only needs to maintain 4 | ||
stream connections with each PD follower rather than 50 stream connections with all the TiDB | ||
servers. | ||
|
||
![TSO Follower Proxy](../media/tso-follower-proxy.png) | ||
|
||
Also, this could be configured in a TiDB cluster by enabling a single session variable like `tidb_enable_tso_follower_proxy` globally. | ||
|
||
## Drawbacks | ||
|
||
- Increase the TSO latency if the QPS is not high enough. | ||
- Increase the CPU usage of the PD followers. | ||
- Increase code complexity. |