Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storelimit: fix datarace from getOrCreateStoreLimit #8254

Merged
merged 5 commits into from
Jun 5, 2024

Conversation

lhy1024
Copy link
Contributor

@lhy1024 lhy1024 commented Jun 5, 2024

What problem does this PR solve?

Issue Number: Close #8253

What is changed and how does it work?

Check List

Tests

  • Unit test
  • Manual test (add detailed scripts or steps below)

go test -timeout 120s -run ^TestConcurrentAddOperatorAndSetStoreLimit$ github.com/tikv/pd/pkg/schedule/operator -race

  • run test TestConcurrentAddOperatorAndSetStoreLimit on master
WARNING: DATA RACE
Write at 0x00c0005ae480 by goroutine 220:
  github.com/tikv/pd/pkg/core/storelimit.(*limit).Reset()
      /home/lhy1024/pd/pkg/core/storelimit/store_limit.go:151 +0x227
  github.com/tikv/pd/pkg/core/storelimit.(*StoreRateLimit).Reset()
      /home/lhy1024/pd/pkg/core/storelimit/store_limit.go:126 +0x3b
  github.com/tikv/pd/pkg/core.(*StoresInfo).ResetStoreLimit.ResetStoreLimit.func1()
      /home/lhy1024/pd/pkg/core/store_option.go:258 +0x14b
  github.com/tikv/pd/pkg/core.(*StoreInfo).Clone()
      /home/lhy1024/pd/pkg/core/store.go:112 +0x17a
  github.com/tikv/pd/pkg/core.(*StoresInfo).ResetStoreLimit()
      /home/lhy1024/pd/pkg/core/store.go:870 +0x208
  github.com/tikv/pd/pkg/schedule/operator.(*Controller).getOrCreateStoreLimit()
      /home/lhy1024/pd/pkg/schedule/operator/operator_controller.go:989 +0x2b7
  github.com/tikv/pd/pkg/schedule/operator.(*Controller).ExceedStoreLimit()
      /home/lhy1024/pd/pkg/schedule/operator/operator_controller.go:966 +0x464
  github.com/tikv/pd/pkg/schedule/operator.(*Controller).AddOperator()
      /home/lhy1024/pd/pkg/schedule/operator/operator_controller.go:355 +0x6b
  github.com/tikv/pd/pkg/schedule/operator.TestConcurrentAddOperatorAndSetStoreLimit.func1()
      /home/lhy1024/pd/pkg/schedule/operator/operator_controller_test.go:987 +0x2fb
  github.com/tikv/pd/pkg/schedule/operator.TestConcurrentAddOperatorAndSetStoreLimit.gowrap1()
      /home/lhy1024/pd/pkg/schedule/operator/operator_controller_test.go:994 +0x41

Previous read at 0x00c0005ae480 by goroutine 221:
  github.com/tikv/pd/pkg/core/storelimit.(*limit).Available()
      /home/lhy1024/pd/pkg/core/storelimit/store_limit.go:162 +0xb2
  github.com/tikv/pd/pkg/core/storelimit.(*StoreRateLimit).Available()
      /home/lhy1024/pd/pkg/core/storelimit/store_limit.go:101 +0x3a
  github.com/tikv/pd/pkg/schedule/operator.(*Controller).ExceedStoreLimit()
      /home/lhy1024/pd/pkg/schedule/operator/operator_controller.go:970 +0x4e5
  github.com/tikv/pd/pkg/schedule/operator.(*Controller).AddOperator()
      /home/lhy1024/pd/pkg/schedule/operator/operator_controller.go:355 +0x6b
  github.com/tikv/pd/pkg/schedule/operator.TestConcurrentAddOperatorAndSetStoreLimit.func1()
      /home/lhy1024/pd/pkg/schedule/operator/operator_controller_test.go:987 +0x2fb
  github.com/tikv/pd/pkg/schedule/operator.TestConcurrentAddOperatorAndSetStoreLimit.gowrap1()
      /home/lhy1024/pd/pkg/schedule/operator/operator_controller_test.go:994 +0x41
  • run test TestConcurrentAddOperatorAndSetStoreLimit with this PR
ok  	github.com/tikv/pd/pkg/schedule/operator	1.473s

Release note

None.

Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
Copy link
Contributor

ti-chi-bot bot commented Jun 5, 2024

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • nolouch
  • rleungx

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-triage-completed release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. labels Jun 5, 2024
@ti-chi-bot ti-chi-bot bot requested review from nolouch and rleungx June 5, 2024 02:39
@ti-chi-bot ti-chi-bot bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 5, 2024
Copy link

codecov bot commented Jun 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.35%. Comparing base (301fabb) to head (4185a77).

Current head 4185a77 differs from pull request most recent head 0effc76

Please upload reports for the commit 0effc76 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8254      +/-   ##
==========================================
+ Coverage   77.30%   77.35%   +0.05%     
==========================================
  Files         471      471              
  Lines       61370    61375       +5     
==========================================
+ Hits        47443    47478      +35     
+ Misses      10372    10329      -43     
- Partials     3555     3568      +13     
Flag Coverage Δ
unittests 77.35% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Signed-off-by: lhy1024 <[email protected]>
}

// Reset resets the rate limit.
func (l *limit) Reset(ratePerSec float64) {
l.ratePerSecMutex.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it happen in the current code or just in your testing code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think datarace can happen in the current code, we call getOrCreateStoreLimit in many places now, like in AddOperator or tryAddOperators, where we call ExceedStoreLimit to check the limit.

If the getOrCreateStoreLimit enters the ResetStoreLimit branch, then datarace is possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And just like the test I built, this scenario is also possible to datarace if we set storelimit when other code calls getOrCreateStoreLimit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems it is caused by #8032

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems it is caused by #8032

sure, this test will fail since #8032

@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jun 5, 2024
@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jun 5, 2024
@lhy1024
Copy link
Contributor Author

lhy1024 commented Jun 5, 2024

/merge

Copy link
Contributor

ti-chi-bot bot commented Jun 5, 2024

@lhy1024: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Copy link
Contributor

ti-chi-bot bot commented Jun 5, 2024

This pull request has been accepted and is ready to merge.

Commit hash: 4185a77

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label Jun 5, 2024
@ti-chi-bot ti-chi-bot bot merged commit 0bf9e90 into tikv:master Jun 5, 2024
15 checks passed
@lhy1024 lhy1024 deleted the fix-datarace4 branch June 5, 2024 08:51
@nolouch nolouch added the needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. label Jun 5, 2024
@nolouch
Copy link
Contributor

nolouch commented Jun 5, 2024

/run-cherry-picker

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #8258.

ti-chi-bot bot pushed a commit that referenced this pull request Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has signed the dco. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storelimit: getOrCreateStoreLimit will meet datarace
4 participants