-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support Cluster Snapshot Backup: deletion control (part4) #54980
Conversation
} | ||
return valid; | ||
} | ||
|
||
public TClusterSnapshotJobsResponse getAllJobsInfo() { | ||
TClusterSnapshotJobsResponse response = new TClusterSnapshotJobsResponse(); | ||
for (Map.Entry<Long, ClusterSnapshotJob> entry : historyAutomatedSnapshotJobs.entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Incorrect conversion of time units leading to potential logic errors.
You can modify the code like this:
valid = (alterJob.getFinishedTimeMs() < getValidDeletionTimeSecByAutomatedSnapshot() * 1000L);
Explanation: The multiplication of getValidDeletionTimeSecByAutomatedSnapshot()
by 1000 should explicitly use a long literal (1000L
) to ensure the calculation handles large numbers properly, matching the type of getFinishedTimeMs()
which likely returns milliseconds as a long.
@@ -635,7 +651,7 @@ private synchronized void disableRecoverPartitionWithSameName(long dbId, long ta | |||
continue; | |||
} | |||
partitionInfo.setRecoverable(false); | |||
idToRecycleTime.replace(partitionInfo.getPartition().getId(), 0L); | |||
idToRecycleTime.replace(partitionInfo.getPartition().getId(), System.currentTimeMillis()); | |||
break; | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Changing the recycle time to System.currentTimeMillis()
may prevent immediate deletion of non-recoverable tables or partitions, which should be set to zero.
You can modify the code like this:
// Keep original behavior for non-recoverable items by setting their recycle time to 0.
idToRecycleTime.put(table.getId(), !recoverable ? 0 : System.currentTimeMillis());
// Update other relevant parts where non-recoverable state should set time to 0 similarly.
if (idToRecycleTableInfo.get(id) != null) { | ||
recoverable = idToRecycleTableInfo.get(id).isRecoverable(); | ||
} else if (idToPartition.get(id) != null) { | ||
recoverable = idToPartition.get(id).isRecoverable(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated get()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
@@ -52,6 +53,8 @@ public class ClusterSnapshotMgr implements GsonPostProcessable { | |||
@SerializedName(value = "historyAutomatedSnapshotJobs") | |||
private TreeMap<Long, ClusterSnapshotJob> historyAutomatedSnapshotJobs = new TreeMap<>(); | |||
|
|||
private long previousAutomatedSnapshotCreatedTimsMs = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed ? there is a createdTimeMs in snapshot
|
||
boolean findFirstSuccess = false; | ||
long previousAutomatedSnapshotCreatedTimsMs = 0; | ||
for (Long key : historyAutomatedSnapshotJobs.descendingKeySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using descendingMap is better ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
21bf08a
to
4683647
Compare
7109613
to
3ec6546
Compare
if (!idToRecycleTime.containsKey(id)) { | ||
return true; | ||
} | ||
return idToRecycleTime.get(id) < GlobalStateMgr.getCurrentState() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
containsKey() and get() are duplicated looking up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
@@ -105,6 +105,10 @@ public boolean isUnFinishedState() { | |||
state == ClusterSnapshotJobState.FINISHED; | |||
} | |||
|
|||
public boolean isSuccess() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public boolean isSuccess() { | |
public boolean isFinished() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
return 0; | ||
} | ||
|
||
return idToRecycleTime.get(id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a id not in idToRecycleTime, it will throw exception ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original logic, the caller will ensure that the key should be existed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could simply handle this exception ? just return 0 is ok ?
c66d6ee
to
3f4cbce
Compare
Signed-off-by: srlch <[email protected]>
Signed-off-by: srlch <[email protected]>
Signed-off-by: srlch <[email protected]>
Signed-off-by: srlch <[email protected]>
Signed-off-by: srlch <[email protected]>
Signed-off-by: srlch <[email protected]>
Signed-off-by: srlch <[email protected]>
3f4cbce
to
676ade4
Compare
Signed-off-by: srlch <[email protected]>
Signed-off-by: srlch <[email protected]>
Quality Gate passedIssues Measures |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 82 / 86 (95.35%) file detail
|
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
historyAutomatedSnapshotJobs.pollFirstEntry(); | ||
} | ||
historyAutomatedSnapshotJobs.put(job.getId(), job); | ||
} | ||
|
||
public synchronized long getValidDeletionTimeMsByAutomatedSnapshot() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public synchronized long getValidDeletionTimeMsByAutomatedSnapshot() { | |
public synchronized long getSafeDeletionTimeMs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to add another function:
boolean isDeletiionSafeToExecute(long deletionCreatedTimeMs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix in next pr
return previousAutomatedSnapshotCreatedTimsMs; | ||
} | ||
|
||
public synchronized boolean checkValidDeletionForTableFromAlterJob(long tableId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public synchronized boolean checkValidDeletionForTableFromAlterJob(long tableId) { | |
public synchronized boolean isTableSafeToDeleteTablet(long tableId) { ? |
return 0; | ||
} | ||
|
||
return idToRecycleTime.get(id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could simply handle this exception ? just return 0 is ok ?
https://github.com/Mergifyio backport branch-3.4 |
✅ Backports have been created
|
…54980) Signed-off-by: srlch <[email protected]> (cherry picked from commit 5cc4e6a)
…backport #54980) (#55206) Co-authored-by: srlch <[email protected]>
What I'm doing:
This is part 4 for Support Cluster Snapshot Backup
In this pr, we support deletion control to keep the files in BE/CN side needed by the
current cluster snapshot. The basic idea is following:
cluster snapshot, called
ValidDeletionTimeMs
ValidDeletionTimeMs
for vacuum.ValidDeletionTimeMs
to filter for partition/table in recycleBinValidDeletionTimeMs
to filter alter jobFixes #53867
#53867
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: