diff --git a/.gitignore b/.gitignore index 504afef8..117028db 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,6 @@ +# Node.js node_modules/ package-lock.json + +# Others +.DS_Store \ No newline at end of file diff --git a/media/tso-client-batch-new.png b/media/tso-client-batch-new.png new file mode 100644 index 00000000..0b409491 Binary files /dev/null and b/media/tso-client-batch-new.png differ diff --git a/media/tso-client-batch-old.png b/media/tso-client-batch-old.png new file mode 100644 index 00000000..94c66964 Binary files /dev/null and b/media/tso-client-batch-old.png differ diff --git a/media/tso-follower-proxy.png b/media/tso-follower-proxy.png new file mode 100644 index 00000000..e1dde7bb Binary files /dev/null and b/media/tso-follower-proxy.png differ diff --git a/text/0078-improve-tso-scalability.md b/text/0078-improve-tso-scalability.md new file mode 100644 index 00000000..be6a26e0 --- /dev/null +++ b/text/0078-improve-tso-scalability.md @@ -0,0 +1,93 @@ +# Improve the Scalability of TSO Service + +- RFC PR: https://github.com/tikv/rfcs/pull/78 +- Tracking Issue: https://github.com/tikv/pd/issues/3149 + +## Summary + +As the size of the cluster grows, some users encountered a situation that is +extremely challenging for TSO performance. In this kind of cluster that may +have many TiDB instances, a certain number of PD clients will request TSO to +the PD leader concurrently, and this puts a lot of CPU pressure on the PD leader +because of the Go Runtime scheduling caused by the gRPC connection switch. With +the high CPU pressure, TSO requests will suffer from increasing average latency +and long-tail latency, which will hurt TiDB's QPS performance. To improve the +scalability of the TSO Service, we will propose two kinds of enhancement to both TSO +client and server in this RFC to reduce the PD CPU usage and improve QPS performance. + +## Motivation + +As mentioned before, improvement is needed in the cluster that has a certain number +of TiDB instances, i.e PD clients, to reduce the CPU pressure of PD leader. With +better TSO handling performance, in this case, TSO service won't be the potential +bottleneck of a big cluster easily and have better scalability. + +## Detailed design + +### Current TSO processing flow + +Before proposing specific improvements, let me give a brief overview of the current +PD client and TSO processing flow first. + +For the PD client, every time it receive a TSO request from the upper level, it won't +send the TSO gRPC request to the PD leader immediately. Instead, it will collect as +many TSO requests as it can from a channel and batch them as a single request to +send to reduce the gRPC request number it actually sent. For example, at a certain +moment, PD client may receive 10 TSO requests at once, it will send just one TSO gRPC +request to the PD leader with an argument `count` inside the gRPC request body. + +![TSO Client Batch - Old](../media/tso-client-batch-old.png) + +For the PD server, it just takes the request and returns an incremental and unique TSO +request with 4 counts and return it to the client. After the PD client receives the +response, it will split the single TSO request into 10 different TSOs and return it to +the upper requester. + +During the whole process flow, every PD client will maintain a gRPC stream with the PD +leader to send and receive the TSO request efficiently. + +### Enhancement #1: Improve the batch effect + +An intuitive improvement is to improve the batch effect of TSO, i.e. to reduce the number +of TSO gRPC requests by increasing the size of each batch with the same number of TSO requests. +With fewer TSO gRPC requests, the CPU pressure on the PD leader can be reduced directly. + +We can introduce different batch strategies such as waiting for a while to fetch more TSO requests +in a same interval or even more complicated one like a dynamic smallest batch size. + +![TSO Client Batch - New](../media/tso-client-batch-new.png) + +As the figure above shows, we can introduce a session variable like `@@tidb_tso_client_batch_max_wait_time` +to control the max wait time the PD client is willing to wait for more TSO requests, which could make the +batch size bigger without hurting the latency. + +Predicting strategy is also a useful way to improve the batch effect. For example, the PD Client +could collect as much information as it needs such as latency and batch size in the last few minutes, +then base on these information, the PD client could calculate a suitable expected batch size to predict +the incoming TSO number, which make the batch waiting more effective. + +### Enhancement #2: Use proxy to reduce the stream number + +However, as mentioned before, according to our pprof result, the main reason of high PD leader +CPU usage is the Go Runtime scheduling caused by the gRPC connection switch. Batch effect +improvement does not alleviate this part at the root. To reduce the connection switch, +fewer gRPC stream number is necessary. We can introduce the TSO Follower Proxy feature +to achieve this. + +Every TSO request will be sent to different TSO servers (including both the PD leader and follower) +randomly by the PD client, and multiple TSO requests sent to the same TSO follower will be +batched again with the same logic as a PD client before being forwarded to the PD leader. +With this implementation, the gRPC pressure is distributed to each PD server, and for the +PD leader, if we have 50 TiDB instances and 5 PD instances, it only needs to maintain 4 +stream connections with each PD follower rather than 50 stream connections with all the TiDB +servers. + +![TSO Follower Proxy](../media/tso-follower-proxy.png) + +Also, this could be configured in a TiDB cluster by enabling a single session variable like `tidb_enable_tso_follower_proxy` globally. + +## Drawbacks + +- Increase the TSO latency if the QPS is not high enough. +- Increase the CPU usage of the PD followers. +- Increase code complexity.