-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add retry max retry mechanism for backup creation #3442
Conversation
Warning Rate limit exceeded@PhanLe1010 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 7 minutes and 40 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (3)
WalkthroughThis pull request introduces a new utility Changes
Sequence DiagramsequenceDiagram
participant BC as BackupController
participant TC as TimedCounter
participant BM as Backup Monitor
BC->>TC: Initialize creationRetryCounter
BC->>BM: Check Engine Readiness
alt Engine Not Ready
BC->>TC: Increase Retry Count
BC->>BM: Retry Check
TC->>BC: Track Retry Attempts
end
alt Max Retries Exceeded
BC->>BC: Set Backup State to Error
end
BC->>TC: Periodic Garbage Collection
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
3d44089
to
f34c8a0
Compare
@mergify backport v1.8.x v1.7.x |
✅ Backports have been created
|
22935f8
to
157e7e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
controller/backup_controller.go
(7 hunks)util/timed_counter.go
(1 hunks)util/timed_counter_test.go
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Summary
🔇 Additional comments (4)
util/timed_counter.go (1)
1-88
: Implementation ofTimedCounter
is correct and thread-safe.The
TimedCounter
provides a thread-safe mechanism for counting and managing time-based entries. All methods correctly use synchronization primitives to ensure safe concurrent access. Initialization, incrementing, retrieval, deletion, and garbage collection of entries are properly handled.util/timed_counter_test.go (1)
1-72
: Unit tests effectively coverTimedCounter
functionality.The unit tests validate key functionalities of the
TimedCounter
, including count increment, deletion of entries, manual garbage collection, and automatic garbage collection viaRunGC
. This ensures robustness and correctness of the implementation.controller/backup_controller.go (2)
891-892
: Ensure cleanup of retry counters after successful backup creation.After successfully creating the backup monitor, the
creationRetryCounter
entry for the backup is deleted:// backup creation is succeeded, remove it from the counter bc.creationRetryCounter.DeleteEntry(backup.Name)This is correct to prevent stale entries from persisting in the counter.
44-46
: Consistent definition of error messages and constants.The new error messages and constants are well-defined and provide clear descriptions for failures related to engine and snapshot readiness:
FailedWaitingForSnapshotMessage
FailedWaitingForEngineMessage
creationRetryCounterExpiredDuration
creationRetryCounterGCDuration
maxCreationRetry
Also applies to: 56-59
controller/backup_controller.go
Outdated
if bc.creationRetryCounter.GetCount(backup.Name) >= maxCreationRetry { | ||
backup.Status.Error = fmt.Sprintf(FailedWaitingForEngineMessage, engine.Name) | ||
backup.Status.State = longhorn.BackupStateError | ||
backup.Status.LastSyncedAt = metav1.Time{Time: time.Now().UTC()} | ||
bc.creationRetryCounter.DeleteEntry(backup.Name) | ||
return nil, fmt.Errorf("failed waiting for the engine %v to be running before enabling backup monitor", engine.Name) | ||
} | ||
backup.Status.State = longhorn.BackupStatePending | ||
backup.Status.Messages[MessageTypeReconcileInfo] = fmt.Sprintf(WaitForEngineMessage, engine.Name) | ||
bc.creationRetryCounter.IncreaseCount(backup.Name) | ||
return nil, fmt.Errorf("waiting for the engine %v to be running before enabling backup monitor", engine.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shared retry counter may conflate retries for different conditions.
The creationRetryCounter
is used to track retries for multiple conditions in the checkMonitor
function, specifically:
- Waiting for the engine to be ready (lines 825-835).
- Waiting to retrieve the snapshot (lines 841-854).
- Waiting for the snapshot to be ready (lines 855-869).
Using a single counter for different conditions can lead to unintended behavior where retries for one condition consume the retry counts for another. This may cause the backup process to fail prematurely if one condition incurs more retries, affecting the other conditions' ability to retry adequately.
Consider using separate counters for each condition to independently track retries. This ensures that the retry mechanisms for waiting for the engine, retrieving the snapshot, and waiting for the snapshot to be ready are functioning correctly without impacting each other. For example, introduce individual TimedCounter
instances or maintain a map of counters keyed by condition and backup name.
Also applies to: 841-854, 855-869
Prevent the backup CR from being stuck in the pending state forever. Backup creation will only be retried up until max retry longhorn-10090 Signed-off-by: Phan Le <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @PhanLe1010
Prevent the backup CR from being stuck in the pending state forever. Backup creation will only be retried up until max retry
longhorn/longhorn#10090