-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fair_queue: make the fair_group token grabbing discipline more fair
The current design of `fair_group` isn't fair enough to shards. During contention, the group will be -- aproximately -- taking requests from shards one-by-one, in round robin. This guarantees that each contender will dispatch an equal *number* of requests. This is some kind of fairness, but it's not the kind we want, probably ever. A better kind of fairness is that under contention, each shard should be guaranteed `1/nr_shards` of the disk's IOPS and/or `1/nr_shards` of byte-bandwidth, whichever dimension it pressures more. This is needed so that each shard can be relied on to sustain a certain rate of requests -- the lower bound of the slowest shard's throughput usually dictates the throughput of the entire cluster. But those two kinds of fairness are only the same if all IO requests have the same size and direction. Otherwise they can be drastically different. With the current design it's easy to create a situation where a shard receives an arbitrarily small fraction of both IOPS and bandwidth, despite being IO-bound. (Example: a node with X shards, where one shard spams only very small requests and other shards spams only big requests). This is a problem in practice. In ScyllaDB, we observed IO starvation of some shards during realistic workloads. While they require some workload asymmetry to occur, even small asymmetries can cause serious unfairness to occur. (For example, a shard which receives 6% more of database queries than other shards can be starved to less than 50% of its fair share of IOPS and/or bandwidth -- because each of those 1 kiB queries is "fairly" matched with 16x costlier 128 kiB low-priority batch IO requests on other shards). To improve this, `fair_group` needs a different queueing discipline. There are many possible ways, but this patch chooses the one which is relatively the most similar to the current one. The main idea is that we still rely on the "approximate round robin" of token queue as the basis for fairness, but we reserve a fixed-size batch of tokens at a time, rather than a fixed-size (i.e. 1) batch of _requests_ at a time. This turns the discipline from approximately-request-fair to approximately-token-fair, which is what we want. The implementation details are non-trivial, though, and should be carefully reviewed.
- Loading branch information
Showing
3 changed files
with
126 additions
and
60 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters