Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support timestamp bounded read-only transactions on data replicas #19817

Closed
zz-jason opened this issue Sep 6, 2020 · 3 comments
Closed

support timestamp bounded read-only transactions on data replicas #19817

zz-jason opened this issue Sep 6, 2020 · 3 comments
Labels
feature/accepted This feature request is accepted by product managers type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@zz-jason
Copy link
Member

zz-jason commented Sep 6, 2020

Feature Request

Is your feature request related to a problem? Please describe:

There is a scenario that in a multi-region application, mostly all the database transactions are read-only, only some of them are read-write and write-only transactions. Applications can tolerate high read-write and write-only transaction durations. But the throughout of read-only transactions on each Region should be maximized, the QPS should be as high as possible, while the latency should be as low as possible. For read-only transactions, it's OK for the applications that a timestamp bounded stale data is returned.

A solution for these applications is to gather all the leader replicas to one Region, deploy learner replicas on other Regions to maximize the read throughput.

TiDB has supported Follower Read since release 4.0.0. At present, it has these limitations in this scenario:

  • It only provides strong-read to guarantee to see all the effects of all transactions that have committed before the start of the read. As mentioned in the document: "To achieve strongly consistent reads, the follower node currently needs to request the current execution progress from the leader node (that is ReadIndex), which causes an additional network request overhead.". So the read latency and read QPS can not be maximized due to the cross-region network round-trip which is brought by the ReadIndex operation.

  • Only the follower replica can provide reads. In order to support more Regions or AZs, we need to add more follower replicas, which influences the raft quorum, introduces more cross-region network round-trips on write transactions.

Describe the feature you'd like:

Support read-only transactions on follower and learner replicas with a bounded timestamp.

Like Cloud Spanner Timestamp bounds, there are three kinds of stale read:

  • strong: which is support by the current Follower Read in TiDB.
  • bounded staleness: chooses the newest timestamp within the staleness bound that allows execution of the reads at the closest available leader, follower, or learner replica. ReadIndex can be optimized if the current
  • exact staleness: "These timestamp bounds execute reads at a user-specified timestamp. Reads at a timestamp are guaranteed to see a consistent prefix of the global transaction history: they observe modifications done by all transactions with a commit timestamp less than or equal to the read timestamp, and observe none of the modifications done by transactions with a larger commit timestamp.". it can also be achieved by the time-travel queries described in Support "INSERT INTO … SELECT … AS OF timestamp" Query #18672

BTW, I found we can utilize the syntax added in pingcap/parser#610 to support time travel read-only queries.

  1. exact staleness
-- start a read-only transaction which reads the exact snapshot 5 seconds ago:
START TRANSACTION READ ONLY WITH TIMESTAMP BOUND EXACT STALENESS '00:00:05';

-- start a read-only transaction which reads the exact snapshot at "2020-09-06 00:00:00":
START TRANSACTION READ ONLY WITH TIMESTAMP BOUND READ TIMESTAMP '2020-09-06 00:00:00';
  1. time-bounded staleness
-- start a read-only transaction which reads the data no stale than 10 seconds ago:
START TRANSACTION READ ONLY WITH TIMESTAMP BOUND MAX STALENESS '00:00:10'

-- start a read-only transaction which reads the data no stale than '2019-11-04 00:00:00':
START TRANSACTION READ ONLY WITH TIMESTAMP BOUND MIN READ TIMESTAMP '2019-11-04 00:00:00';

But it's better to support these modifiers to the begin statements as well.

Describe alternatives you've considered:

N/A

Teachability, Documentation, Adoption, Migration Strategy:

N/A

@zz-jason zz-jason added the type/feature-request Categorizes issue or PR as related to a new feature. label Sep 6, 2020
@ghost
Copy link

ghost commented Sep 6, 2020

Relates to #18672

I think it's useful to have this as part of a START TRANSACTION statement because then all queries that follow will not need additional syntax. From the comment here, may I suggest including a pseudo-function to use the most up to date consistent snapshot from followers:

START TRANSACTION READ ONLY WITH TIMESTAMP BOUND MAX STALENESS follower_read_timestamp();

@zz-jason
Copy link
Member Author

zz-jason commented Sep 7, 2020

a pseudo-function to use the most up to date consistent snapshot from followers

The staleness maybe is not acceptable to the applications. Is there any use case of this behavior?

@zz-jason
Copy link
Member Author

zz-jason commented Sep 7, 2020

This feature request can be regarded as a subtask of #18033

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/accepted This feature request is accepted by product managers type/feature-request Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants