-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataMove Should Decide BulkLoading After Old DataMove Actor Has Been Cleared #11947
Open
kakaiu
wants to merge
2
commits into
apple:main
Choose a base branch
from
kakaiu:fix-bulkload-bug
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+85
−5
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
saintstack
approved these changes
Feb 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine to me (though I'm the first to admit I'm not too clear on how this all works)
jzhou77
approved these changes
Feb 13, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes a bulkload bug causing DD restarts.
In the bulkLoad datamove mechanism, when a bulkload is triggered, this task is registered on a map. For any following data move on the same range, the data move is converted to a "bulkload" data move, which triggers SS to load data. When the bulkload data move starts. it updates the bulkload metadata. At this time, it checks if the metadata shows correct phase (the metadata should not indicate the move has been completed). If the task has been marked as completed, the data move triggers DD restart.
The problem happens when a previous data move for the bulkload task is cancelled by the new data move for the same task. The issue happens when the previous data move marks the bulkload task as completed, before the new data move starts the same task. As a result, when the new data move tries to start the task, it finds that the bulkload metadata has been marked as completed which is unexpected and triggers DD restarts.
The fix is that the new data move should decide to do bulk loading after the cancellation of the old data move. So, after the cancellation of the old data move, if the bulkload task metadata has been marked as completed, the new data move does not do bulk loading.
100K bulkload tests:
20250213-174727-zhewang-aa06c0d4d7bf86be compressed=True data_size=36849755 duration=8769146 ended=100000 fail=2 fail_fast=10 max_runs=100000 pass=99998 priority=100 remaining=0 runtime=1:11:07 sanity=False started=100000 stopped=20250213-185834 submitted=20250213-174727 timeout=5400 username=zhewang
100K bulkdump tests:
20250213-190617-zhewang-a7491d9a121a5380 compressed=True data_size=36849352 duration=11450458 ended=100000 fail=1 fail_fast=10 max_runs=100000 pass=99999 priority=100 remaining=0 runtime=1:22:49 sanity=False started=100000 stopped=20250213-202906 submitted=20250213-190617 timeout=5400 username=zhewang
100K correctness tests:
20250213-213207-zhewang-0e91c2ddace17da4 compressed=True data_size=36812798 fail_fast=10 max_runs=100000 priority=100 sanity=False submitted=20250213-213207 timeout=5400 username=zhewang
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branch
ormain
if this is the youngest branch)