Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

controller: fix limiter cannot work well in high concurrency scenario #8436

Merged
merged 5 commits into from
Jul 24, 2024

Conversation

nolouch
Copy link
Contributor

@nolouch nolouch commented Jul 24, 2024

What problem does this PR solve?

Issue Number: Close #8435

What is changed and how does it work?

controller: Fix limiter not functioning well in high concurrency scenarios
- In high concurrency scenarios, time may appear rollback because the `now` value passed from outside. high mutext completion leading to more non-sequential execution orders.
- Time rollback allows advancing more tokens, which can cause the issue. even result in no limit for the controller.
- Fix the problem by avoiding time rollback; instead of acquiring time again within the lock to fix it, as this might incur high costs when frequently acquiring time.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
Concurrency reserveN limit Need Wait Actually QPS Actually RU_COST Pass
10000 1000 4000000 2.500s 4000.00 4000000.00
5000 1000 4000000 1.250s 4001.00 4001000.00
3840 1000 4000000 959.974ms 4004.00 4004000.00
1000 1000 4000000 249.516ms 4003.00 4003000.00
1000 200 4000000 49.280ms 19995.00 3999000.00
1000 5000 4000000 1.251s 799.00 3995000.00
10000 50 400000 1.250s 8004.00 400200.00
5000 50 400000 624.458ms 8008.00 400400.00
3840 50 400000 479.007ms 7999.00 399950.00
1000 50 400000 124.289ms 8006.00 400300.00
500 50 400000 61.506ms 8006.00 400300.00
200 50 400000 24.196ms 7997.00 399850.00
100 50 400000 12.593ms 8006.00 400300.00
1000 10 400000 24.047ms 39988.00 399880.00
1000 250 400000 624.222ms 1599.00 399750.00
10000 500 400000 12.515s 799.00 399500.00
5000 500 400000 6.257s 798.00 399000.00
3840 500 400000 4.804s 798.00 399000.00
1000 500 400000 1.250s 798.00 399000.00
1000 100 200000 500.261ms 2001.00 200100.00
1000 100 400000 250.175ms 4003.00 400300.00

real workload
image

Release note

None.

Signed-off-by: nolouch <[email protected]>
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 24, 2024
@nolouch nolouch requested review from HuSharp and JmPotato July 24, 2024 08:29
Signed-off-by: nolouch <[email protected]>
Signed-off-by: nolouch <[email protected]>
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 24, 2024
@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jul 24, 2024
Copy link

codecov bot commented Jul 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.31%. Comparing base (ca179e6) to head (b209858).
Report is 20 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8436      +/-   ##
==========================================
+ Coverage   77.24%   77.31%   +0.06%     
==========================================
  Files         471      471              
  Lines       61696    61925     +229     
==========================================
+ Hits        47660    47879     +219     
- Misses      10454    10467      +13     
+ Partials     3582     3579       -3     
Flag Coverage Δ
unittests 77.31% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@ti-chi-bot ti-chi-bot added the needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. label Jul 24, 2024
@ti-chi-bot ti-chi-bot bot added the lgtm label Jul 24, 2024
Copy link
Contributor

ti-chi-bot bot commented Jul 24, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: HuSharp, JmPotato

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jul 24, 2024
Copy link
Contributor

ti-chi-bot bot commented Jul 24, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-07-24 08:54:40.714057491 +0000 UTC m=+1035302.704998961: ☑️ agreed by JmPotato.
  • 2024-07-24 11:48:00.106614887 +0000 UTC m=+1045702.097556358: ☑️ agreed by HuSharp.

Copy link
Contributor

ti-chi-bot bot commented Jul 24, 2024

@nolouch: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-integration-realcluster-test 83023f6 link false /test pull-integration-realcluster-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ti-chi-bot ti-chi-bot bot merged commit 83f32e9 into tikv:master Jul 24, 2024
20 of 21 checks passed
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #8437.

ti-chi-bot bot pushed a commit that referenced this pull request Jul 24, 2024
…#8436) (#8437)

close #8435

controller: Fix limiter not functioning well in high concurrency scenarios
- In high concurrency scenarios, time may appear rollback because the `now` value passed from outside. high mutext completion leading to more non-sequential execution orders.
- Time rollback allows advancing more tokens, which can cause the issue. even result in no limit for the controller.
- Fix the problem by avoiding time rollback; instead of acquiring time again within the lock to fix it, as this might incur high costs when frequently acquiring time.

Signed-off-by: nolouch <[email protected]>

Co-authored-by: nolouch <[email protected]>
@nolouch nolouch deleted the fix-limit-timefallback branch July 24, 2024 15:35
@HuSharp HuSharp added needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. labels Jul 25, 2024
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #8438.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #8439.

ti-chi-bot bot pushed a commit that referenced this pull request Jul 25, 2024
…#8436) (#8438)

close #8435

controller: Fix limiter not functioning well in high concurrency scenarios
- In high concurrency scenarios, time may appear rollback because the `now` value passed from outside. high mutext completion leading to more non-sequential execution orders.
- Time rollback allows advancing more tokens, which can cause the issue. even result in no limit for the controller.
- Fix the problem by avoiding time rollback; instead of acquiring time again within the lock to fix it, as this might incur high costs when frequently acquiring time.

Signed-off-by: nolouch <[email protected]>

Co-authored-by: nolouch <[email protected]>
ti-chi-bot bot pushed a commit that referenced this pull request Jul 25, 2024
…#8436) (#8439)

close #8435

controller: Fix limiter not functioning well in high concurrency scenarios
- In high concurrency scenarios, time may appear rollback because the `now` value passed from outside. high mutext completion leading to more non-sequential execution orders.
- Time rollback allows advancing more tokens, which can cause the issue. even result in no limit for the controller.
- Fix the problem by avoiding time rollback; instead of acquiring time again within the lock to fix it, as this might incur high costs when frequently acquiring time.

Signed-off-by: nolouch <[email protected]>

Co-authored-by: nolouch <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resource Controller Limiter may not work well
4 participants