Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: support backoff mechanism for memberLoop #6978

Merged
merged 12 commits into from
Aug 29, 2023

Conversation

HuSharp
Copy link
Member

@HuSharp HuSharp commented Aug 23, 2023

Signed-off-by: husharp [email protected]<!--

Thank you for working on PD! Please read PD's CONTRIBUTING document BEFORE filing this PR.

PR Title Format:

  1. pkg [, pkg2, pkg3]: what's changed
  2. *: what's changed

-->

What problem does this PR solve?

Issue Number: ref #6556

What is changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

PR Summary

  1. Add ready for resp
  • Have goroutine reconnectMemberLoop call updateMember periodically. When calling ScheduleCheckMemberChanged channel, we need to wait for the goroutine to update members until ready or timeout.
  1. Add backoff mechanism
  • When waiting for the goroutine to update, the expo function can be used to backoff to sleep when an error is encountered.

Reproduce Step

  1. enable fail point, like gRPC is throttling, cannot read from etcd.
    curl -X PUT -d 'return(10)' http://tc-pd-1.tc-pd-peer.csn-simulator-big-cluster-vd62g.svc:2379/pd/api/v1/fail/github.com/tikv/pd/pkg/etcdutil/SlowEtcdKVGet

  2. simulate pd lost leader
    curl -X PUT -d 'return("2346857576170797299")' http://tc-pd-1.tc-pd-peer.csn-simulator-big-cluster-vd62g.svc:2379/pd/api/v1/fail/github.com/tikv/pd/server/exitCampaignLeader

Reproduce Result

Grpc request GetMember keeps high:
image

TiKV side show

image

PR Effect

The Grpc GetMember call was reduced from 3.2k to 170, which is relative to the TiDB numbers and client requests for triaging checkLeader.

For 20 * tidb 3 * PD 50 * TiKV
170 = (50 * 3 / 3 / 3[TiKV side] + 20 * 2 [TiDB side]) * 3[PD Num]

And more tests are necessary to ensure that no further issues arise.

image

Release note

None.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Aug 23, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • JmPotato
  • nolouch

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Aug 23, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/cherry-pick-not-approved do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Aug 23, 2023
@ti-chi-bot ti-chi-bot bot requested review from JmPotato and nolouch August 23, 2023 09:46
@ti-chi-bot ti-chi-bot bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Aug 23, 2023
@HuSharp HuSharp force-pushed the release-6.5-add_ready branch 5 times, most recently from 9ad1517 to f4541bf Compare August 24, 2023 08:43
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 24, 2023
@HuSharp HuSharp marked this pull request as ready for review August 24, 2023 09:16
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 24, 2023
@HuSharp HuSharp requested a review from rleungx August 24, 2023 09:16
@codecov
Copy link

codecov bot commented Aug 24, 2023

Codecov Report

Patch coverage: 96.87% and project coverage change: -0.12% ⚠️

Comparison is base (75bb796) 75.75% compared to head (7fc588f) 75.64%.

Additional details and impacted files
@@               Coverage Diff               @@
##           release-6.5    #6978      +/-   ##
===============================================
- Coverage        75.75%   75.64%   -0.12%     
===============================================
  Files              329      330       +1     
  Lines            33600    33627      +27     
===============================================
- Hits             25453    25436      -17     
- Misses            5988     6015      +27     
- Partials          2159     2176      +17     
Flag Coverage Δ
unittests 75.64% <96.87%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
client/keyspace_client.go 62.22% <ø> (ø)
client/base_client.go 83.09% <85.71%> (+0.32%) ⬆️
client/client.go 67.49% <100.00%> (+0.03%) ⬆️
client/retry/backoff.go 100.00% <100.00%> (ø)
server/server.go 74.37% <100.00%> (-1.32%) ⬇️

... and 18 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

client/retry/backoff.go Outdated Show resolved Hide resolved
client/base_client.go Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 25, 2023
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 25, 2023
@ti-chi-bot ti-chi-bot added the cherry-pick-approved Cherry pick PR approved by release team. label Aug 28, 2023
Signed-off-by: husharp <[email protected]>
client/base_client.go Outdated Show resolved Hide resolved
client/retry/backoff.go Outdated Show resolved Hide resolved
Signed-off-by: husharp <[email protected]>
client/base_client.go Outdated Show resolved Hide resolved
client/retry/backoff.go Outdated Show resolved Hide resolved
Signed-off-by: husharp <[email protected]>
@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label Aug 29, 2023
}

// Only used for test.
var testBackOffExecuteFlag = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to make sure back off executed

client/base_client.go Outdated Show resolved Hide resolved
Comment on lines -400 to +413
mustExec([]string{"-u", pdAddr, "scheduler", "config", "balance-hot-region-scheduler"}, &conf1)
// After upgrading, we should not use query.
expected1["read-priorities"] = []interface{}{"query", "byte"}
re.NotEqual(expected1, conf1)
expected1["read-priorities"] = []interface{}{"key", "byte"}
re.Equal(expected1, conf1)
mustExec([]string{"-u", pdAddr, "scheduler", "config", "balance-hot-region-scheduler"}, &conf1)
re.Equal(conf1["read-priorities"], []interface{}{"key", "byte"})
// cannot set qps as write-peer-priorities
echo = mustExec([]string{"-u", pdAddr, "scheduler", "config", "balance-hot-region-scheduler", "set", "write-peer-priorities", "query,byte"}, nil)
re.Contains(echo, "query is not allowed to be set in priorities for write-peer-priorities")
mustExec([]string{"-u", pdAddr, "scheduler", "config", "balance-hot-region-scheduler"}, &conf1)
re.Equal(expected1, conf1)
re.Equal(conf1["write-peer-priorities"], []interface{}{"byte", "key"})

// test remove and add
mustExec([]string{"-u", pdAddr, "scheduler", "remove", "balance-hot-region-scheduler"}, nil)
mustExec([]string{"-u", pdAddr, "scheduler", "add", "balance-hot-region-scheduler"}, nil)
re.Equal(expected1, conf1)
echo = mustExec([]string{"-u", pdAddr, "scheduler", "remove", "balance-hot-region-scheduler"}, nil)
re.Contains(echo, "Success")
echo = mustExec([]string{"-u", pdAddr, "scheduler", "add", "balance-hot-region-scheduler"}, nil)
re.Contains(echo, "Success")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cp #5847 to make TestScheduler stable

Signed-off-by: husharp <[email protected]>
@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Aug 29, 2023
@nolouch
Copy link
Contributor

nolouch commented Aug 29, 2023

/merge

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Aug 29, 2023

@nolouch: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Aug 29, 2023

This pull request has been accepted and is ready to merge.

Commit hash: 7fc588f

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label Aug 29, 2023
@ti-chi-bot ti-chi-bot bot merged commit 71e8929 into tikv:release-6.5 Aug 29, 2023
16 checks passed
@HuSharp HuSharp deleted the release-6.5-add_ready branch August 29, 2023 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-approved Cherry pick PR approved by release team. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants