-
Notifications
You must be signed in to change notification settings - Fork 147
OCPEDGE-2088, OCPEDGE-1885: Updated state transitions & tests for TNF setup job #1481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPEDGE-2088, OCPEDGE-1885: Updated state transitions & tests for TNF setup job #1481
Conversation
|
@jaypoulz: This pull request references OCPEDGE-2088 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. This pull request references OCPEDGE-1885 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughAdds condition-driven orchestration: JobController and runJobController gain a conditions parameter; DefaultConditions and AllConditions are introduced; starter picks setup conditions based on ExternalEtcd completion. Multiple tests and callsites updated to pass and verify condition sets. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro Cache: Disabled due to data retention organization setting Knowledge base: Disabled due to 📒 Files selected for processing (6)
💤 Files with no reviewable changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
🔇 Additional comments (11)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/payload-job periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ovn-two-node-fencing-upgrade |
|
@eggfoobar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4ea6b850-8dc8-11f0-9ac8-d6af57292fea-0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pkg/tnf/pkg/jobs/jobcontroller.go (1)
79-83: Add missingconditionsargument to all NewJobController calls
Update the two‐arg calls in pkg/tnf/operator/starter.go:237 and pkg/tnf/pkg/jobs/jobcontroller_test.go (lines 359 and 489) to include the newconditions []stringparameter (for example,jobs.DefaultConditions) before the existing args.pkg/tnf/operator/starter.go (1)
231-259: Harden job mutation hook to avoid panics on nil labels/short command.Manifest assumptions can drift; guard for safety.
- []jobs.JobHookFunc{ + []jobs.JobHookFunc{ func(_ *operatorv1.OperatorSpec, job *batchv1.Job) error { if nodeName != nil { job.Spec.Template.Spec.NodeName = *nodeName } job.SetName(jobType.GetJobName(nodeName)) - job.Labels["app.kubernetes.io/name"] = jobType.GetNameLabelValue() + if job.Labels == nil { + job.Labels = map[string]string{} + } + job.Labels["app.kubernetes.io/name"] = jobType.GetNameLabelValue() - job.Spec.Template.Spec.Containers[0].Image = os.Getenv("OPERATOR_IMAGE") - job.Spec.Template.Spec.Containers[0].Command[1] = jobType.GetSubCommand() + containers := job.Spec.Template.Spec.Containers + if len(containers) == 0 { + return fmt.Errorf("job template has no containers") + } + containers[0].Image = os.Getenv("OPERATOR_IMAGE") + // Ensure command has at least two args: ["ceo", <subcommand>] + if len(containers[0].Command) < 2 { + containers[0].Command = []string{"ceo", jobType.GetSubCommand()} + } else { + containers[0].Command[1] = jobType.GetSubCommand() + } return nil }}...,
🧹 Nitpick comments (3)
pkg/tnf/pkg/jobs/jobcontroller.go (1)
29-33: Document immutability expectations for DefaultConditions (or return a copy).Exporting a slice invites accidental mutation. At minimum, document “copy before append.” Optionally, change to a constructor that returns a fresh slice.
Minimal doc tweak:
-// By default, we only set the progressing and degraded conditions for TNF setup. +// By default, we only set the progressing and degraded conditions for TNF setup. +// Note: treat DefaultConditions as read-only; copy before modification: +// conds := append([]string(nil), jobs.DefaultConditions...)Alternative (safer API):
-var DefaultConditions = []string{opv1.OperatorStatusTypeProgressing, opv1.OperatorStatusTypeDegraded} +// DefaultConditions returns the baseline conditions managed by TNF jobs. +func DefaultConditions() []string { + return []string{opv1.OperatorStatusTypeProgressing, opv1.OperatorStatusTypeDegraded} +}If you adopt the function, update call sites accordingly.
pkg/tnf/pkg/jobs/jobcontroller_test.go (2)
310-318: Unused helper: remove findConditionThis isn’t referenced; drop it to keep the test lean.
-// Helper function to find a condition by type -func findCondition(conditions []opv1.OperatorCondition, conditionType string) *opv1.OperatorCondition { - for i := range conditions { - if conditions[i].Type == conditionType { - return &conditions[i] - } - } - return nil -} + +// (removed) findCondition was unused
453-507: Mark helper and minor safetyMark newTestContext as a test helper for clearer failure locations.
-func newTestContext(test testCase, t *testing.T, conditions []string) *testContext { +func newTestContext(test testCase, t *testing.T, conditions []string) *testContext { + t.Helper()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting
📒 Files selected for processing (4)
pkg/tnf/operator/starter.go(7 hunks)pkg/tnf/operator/starter_test.go(3 hunks)pkg/tnf/pkg/jobs/jobcontroller.go(3 hunks)pkg/tnf/pkg/jobs/jobcontroller_test.go(9 hunks)
🔇 Additional comments (13)
pkg/tnf/operator/starter_test.go (2)
9-10: LGTM on adding testify/require.Appropriate choice for concise assertions.
40-41: LGTM on importing jobs.DefaultConditions.Keeps tests aligned with runtime behavior.
pkg/tnf/pkg/jobs/jobcontroller.go (1)
162-184: Degraded wiring is correct with WithSyncDegradedOnError.Condition gating via c.conditions and error propagation is consistent.
pkg/tnf/operator/starter.go (3)
73-78: Initial-setup gate: LGTM.Clear separation of initial setup toggles job-condition behavior.
112-114: Passing explicit conditions for per-node jobs: LGTM.Keeps Auth/AfterSetup scoped to non-Available conditions.
154-167: isInitialSetup implementation: LGTM.Accurately derives mode from UnsupportedConfigOverrides via ceohelpers.
pkg/tnf/pkg/jobs/jobcontroller_test.go (7)
54-56: Available left unmanaged under DefaultConditions — OKThis preserves the existing Available=True and only asserts Progressing=True on initial sync. Matches the intent when Available isn’t managed by the controller.
Also applies to: 62-62
73-76: Completed job: Progressing=False while preserving preexisting Available — OKExpectations align with controller not managing Available in this path.
Also applies to: 83-86
95-98: Running job: Progressing=True while preserving Available — OKConsistent with unmanaged Available semantics.
Also applies to: 105-105
138-138: Wiring DefaultConditions into the test context — OKPlumbs the new parameter as expected.
208-308: Available-managed scenarios read wellThe three cases (completed/running/failed) assert the expected flips when Available is managed. Once the aliasing fix above is in, these look solid.
367-367: Passing DefaultConditions to NewJobController — OKNote: correctness here depends on fixing the aliasing in TestSyncWithAvailableCondition.
496-497: Forwarding conditions into controller — OKMatches the updated NewJobController signature.
|
/payload-job periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ovn-two-node-fencing-upgrade |
|
@eggfoobar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/13ee3530-8e0e-11f0-90cc-54c3740fac23-0 |
a12bddf to
c7621a9
Compare
|
/payload-job periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ovn-two-node-fencing-upgrade |
|
@jaypoulz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8301eb60-8e59-11f0-985f-cd7768c4f20c-0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
pkg/tnf/pkg/jobs/jobcontroller_test.go (3)
109-131: Assert Degraded on job failure.Add an expectation that Degraded is set true when the job fails.
expectedObjects: testObjects{ job: makeJob( withJobGeneration(1), withJobStatus(0, 0, 1), withJobFailed()), operator: makeFakeOperatorInstance( withGenerations(1), - withTrueConditions(conditionAvailable), + withTrueConditions(conditionAvailable, conditionDegraded), withFalseConditions(conditionProgressing)), },Outside this hunk, define the missing symbol:
// near lines 43-46 (var block) conditionDegraded = controllerName + opv1.OperatorStatusTypeDegraded
252-273: Also assert Degraded in the “failed with available condition” case.Mirror the failure semantics here too.
expectedObjects: testObjects{ job: makeJob( withJobGeneration(1), withJobStatus(0, 0, 1), withJobFailed()), operator: makeFakeOperatorInstance( withGenerations(1), - withFalseConditions(conditionAvailable), + withTrueConditions(conditionDegraded), + withFalseConditions(conditionAvailable), withFalseConditions(conditionProgressing)), },
204-207: Clone before appending to DefaultConditions.append(DefaultConditions, ...) can mutate its backing array and make tests order-dependent. Clone first.
- conditionsWithAvailable := append(DefaultConditions, opv1.OperatorStatusTypeAvailable) + // Clone to avoid mutating DefaultConditions + conditionsWithAvailable := append([]string(nil), DefaultConditions...) + conditionsWithAvailable = append(conditionsWithAvailable, opv1.OperatorStatusTypeAvailable)
🧹 Nitpick comments (2)
pkg/tnf/operator/starter_test.go (1)
309-324: Use the cancellable ctx you already created.You construct ctx with cancel above but discard it here; pass that ctx through to allow test-time cancellation.
return args{ - ctx: context.Background(), + ctx: ctx,pkg/tnf/pkg/jobs/jobcontroller_test.go (1)
310-318: Unused helper: findCondition()Not referenced; remove to keep the test focused.
-// Helper function to find a condition by type -func findCondition(conditions []opv1.OperatorCondition, conditionType string) *opv1.OperatorCondition { - for i := range conditions { - if conditions[i].Type == conditionType { - return &conditions[i] - } - } - return nil -}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting
📒 Files selected for processing (4)
pkg/tnf/operator/starter.go(7 hunks)pkg/tnf/operator/starter_test.go(4 hunks)pkg/tnf/pkg/jobs/jobcontroller.go(3 hunks)pkg/tnf/pkg/jobs/jobcontroller_test.go(9 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/tnf/pkg/jobs/jobcontroller.go
🔇 Additional comments (2)
pkg/tnf/operator/starter_test.go (1)
202-210: Good fix: cloned DefaultConditions and used Equalf.The copy-before-append prevents slice aliasing, and Equalf formats the message correctly.
Also applies to: 219-221
pkg/tnf/operator/starter.go (1)
126-134: Nice: safe clone before adding Available.Copy + append avoids mutating the shared DefaultConditions backing array.
fb3ee86 to
08c640a
Compare
|
/payload-job periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ovn-two-node-fencing-upgrade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
pkg/tnf/pkg/jobs/jobcontroller_test.go (2)
109-131: Failure path mentions Degraded but doesn’t assert it — verify coverage exists.Comment says “should trigger degraded status via error” but the test doesn’t check a Degraded condition. Please confirm another test asserts Degraded=true on job error, or add it here.
#!/bin/bash # Find tests that assert Degraded updates for this controller rg -nP -C2 '\b(TestJobController|controllerName)\s*\+\s*opv1\.OperatorStatusTypeDegraded\b|OperatorStatusTypeDegraded' -- pkg/tnf
204-207: Slice aliasing bug: append(DefaultConditions, ...) can mutate DefaultConditions.Clone before append to avoid flakiness across tests.
- conditionsWithAvailable := append(DefaultConditions, opv1.OperatorStatusTypeAvailable) + // Clone to avoid mutating DefaultConditions' backing array + conditionsWithAvailable := append([]string(nil), DefaultConditions...) + conditionsWithAvailable = append(conditionsWithAvailable, opv1.OperatorStatusTypeAvailable)
🧹 Nitpick comments (4)
pkg/tnf/operator/starter_test.go (2)
201-217: Simplify: clone once, append condition, and avoid the extra temp slice.Same behavior, fewer moving parts; still avoids aliasing.
- // Verify setup conditions - conditionsWithAvailable := make([]string, len(jobs.DefaultConditions)) - copy(conditionsWithAvailable, jobs.DefaultConditions) - conditionsWithAvailable = append(conditionsWithAvailable, operatorv1.OperatorStatusTypeAvailable) - - setupConditions := jobs.DefaultConditions - if initialTransition { - setupConditions = conditionsWithAvailable - } - - hasAvailableCondition := false - for _, condition := range setupConditions { - if condition == operatorv1.OperatorStatusTypeAvailable { - hasAvailableCondition = true - break - } - } + // Verify setup conditions + setupConditions := append([]string(nil), jobs.DefaultConditions...) + if initialTransition { + setupConditions = append(setupConditions, operatorv1.OperatorStatusTypeAvailable) + } + hasAvailableCondition := false + for _, c := range setupConditions { + if c == operatorv1.OperatorStatusTypeAvailable { + hasAvailableCondition = true + break + } + }
287-294: Don’t cancel informers before the test uses them. Use t.Cleanup and return ctx.Deferring cancel() inside getArgs stops informers as the function returns. Prefer t.Cleanup(cancel) and pass the same ctx out.
- ctx, cancel := context.WithCancel(context.Background()) - defer cancel() + ctx, cancel := context.WithCancel(context.Background()) + t.Cleanup(cancel) @@ - ctx: context.Background(), + ctx: ctx,Also applies to: 308-314
pkg/tnf/pkg/jobs/jobcontroller_test.go (2)
310-318: Remove unused helper.findCondition isn’t used; drop it to keep the test lean.
-// Helper function to find a condition by type -func findCondition(conditions []opv1.OperatorCondition, conditionType string) *opv1.OperatorCondition { - for i := range conditions { - if conditions[i].Type == conditionType { - return &conditions[i] - } - } - return nil -}
453-507: Mark test helper.Add t.Helper() at the top of newTestContext to improve failure locations.
-func newTestContext(test testCase, t *testing.T, conditions []string) *testContext { +func newTestContext(test testCase, t *testing.T, conditions []string) *testContext { + t.Helper()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting
📒 Files selected for processing (4)
pkg/tnf/operator/starter.go(7 hunks)pkg/tnf/operator/starter_test.go(4 hunks)pkg/tnf/pkg/jobs/jobcontroller.go(3 hunks)pkg/tnf/pkg/jobs/jobcontroller_test.go(9 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/tnf/pkg/jobs/jobcontroller.go
- pkg/tnf/operator/starter.go
🔇 Additional comments (2)
pkg/tnf/operator/starter_test.go (2)
57-71: Good approach: wrapper cleanly overrides Informer while delegating all other methods.Embedding v1helpers.StaticPodOperatorClient and overriding Informer() to return a mock is correct and keeps tests hermetic.
Also applies to: 73-81
191-199: Nice: no aliasing + formatted assertions + no shadowing.Copying DefaultConditions before append, using require.Equalf, and avoiding name shadowing read well and prevent subtle bugs.
Also applies to: 219-221
|
@jaypoulz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e320ee80-8e66-11f0-85e2-83c336f84fb7-0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/tnf/pkg/jobs/jobcontroller_test.go (1)
109-131: Assert Degraded status in “job failed” test
The “job failed” case must verify that OperatorStatusTypeDegraded is set to true on error.Apply in pkg/tnf/pkg/jobs/jobcontroller_test.go:
@@ Lines 109-131 operator: makeFakeOperatorInstance( withGenerations(1), - withTrueConditions(conditionAvailable)), + withTrueConditions(conditionAvailable, conditionDegraded)), @@ operator: makeFakeOperatorInstance( withGenerations(1), - withTrueConditions(conditionAvailable), + withTrueConditions(conditionAvailable, conditionDegraded), withFalseConditions(conditionProgressing)),Add the missing constant alongside the others:
var ( conditionAvailable = controllerName + opv1.OperatorStatusTypeAvailable conditionProgressing = controllerName + opv1.OperatorStatusTypeProgressing + conditionDegraded = controllerName + opv1.OperatorStatusTypeDegraded )
♻️ Duplicate comments (1)
pkg/tnf/pkg/jobs/jobcontroller_test.go (1)
204-207: Fix slice aliasing: clone before appending to DefaultConditionsappend can reuse the underlying array and mutate DefaultConditions, causing order-dependent flakiness.
-conditionsWithAvailable := append(DefaultConditions, opv1.OperatorStatusTypeAvailable) +conditionsWithAvailable := append([]string(nil), DefaultConditions...) +conditionsWithAvailable = append(conditionsWithAvailable, opv1.OperatorStatusTypeAvailable)
🧹 Nitpick comments (3)
pkg/tnf/pkg/jobs/jobcontroller_test.go (3)
232-249: Confirm intent: Available=false while job is running when Available is managedThis differs from TestSync’s “job running” case (Available=true). If intentional (controller manages Available only when explicitly enabled), consider a brief comment to prevent future confusion.
310-318: Remove unused helper findCondition (or use it in targeted assertions)Currently unused; delete to reduce noise.
-// Helper function to find a condition by type -func findCondition(conditions []opv1.OperatorCondition, conditionType string) *opv1.OperatorCondition { - for i := range conditions { - if conditions[i].Type == conditionType { - return &conditions[i] - } - } - return nil -}
377-381: Avoid brittle exact match on error messageUse substring match to decouple from wording changes.
-expectedErrorMsg := "job spec was modified, old job is deleted" -if err.Error() != expectedErrorMsg { - t.Errorf("First sync returned unexpected error message: got %q, want %q", err.Error(), expectedErrorMsg) -} +if !strings.Contains(err.Error(), "job spec was modified") { + t.Errorf("First sync returned unexpected error message: %q", err.Error()) +}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting
📒 Files selected for processing (4)
pkg/tnf/operator/starter.go(7 hunks)pkg/tnf/operator/starter_test.go(4 hunks)pkg/tnf/pkg/jobs/jobcontroller.go(3 hunks)pkg/tnf/pkg/jobs/jobcontroller_test.go(9 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- pkg/tnf/pkg/jobs/jobcontroller.go
- pkg/tnf/operator/starter_test.go
- pkg/tnf/operator/starter.go
🔇 Additional comments (4)
pkg/tnf/pkg/jobs/jobcontroller_test.go (4)
54-56: LGTM: correct Available/Progressing expectations on initial syncThe setup and expected transitions look consistent with the controller behavior.
Also applies to: 62-62
73-76: LGTM: completed/running cases align with expected status transitionsThe assertions for Available staying true and Progressing toggling are sound.
Also applies to: 95-98, 105-106
138-139: LGTM: DefaultConditions correctly wired into controller constructionGood coverage to exercise default condition handling.
366-367: LGTM: New NewJobController signature is correctly plumbed with conditionsThe tests exercise both DefaultConditions and augmented conditions.
Also applies to: 453-507
08c640a to
1eba52d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
pkg/tnf/pkg/jobs/jobcontroller_test.go (3)
109-131: Add Degraded=true to failure path and define conditionDegradedFailing jobs should surface Degraded=true. Also add a local constant for the controller-scoped Degraded type.
var ( conditionAvailable = controllerName + opv1.OperatorStatusTypeAvailable conditionProgressing = controllerName + opv1.OperatorStatusTypeProgressing + conditionDegraded = controllerName + opv1.OperatorStatusTypeDegraded )operator: makeFakeOperatorInstance( withGenerations(1), + withTrueConditions(conditionDegraded), withTrueConditions(conditionAvailable), withFalseConditions(conditionProgressing)),Also applies to: 43-46
204-207: Fix slice aliasing: append(DefaultConditions, ...) can mutate DefaultConditionsClone before appending to avoid test-order flakiness.
- conditionsWithAvailable := append(DefaultConditions, opv1.OperatorStatusTypeAvailable) + // Clone to avoid mutating DefaultConditions' backing array + conditionsWithAvailable := append([]string(nil), DefaultConditions...) + conditionsWithAvailable = append(conditionsWithAvailable, opv1.OperatorStatusTypeAvailable)
252-273: Set Degraded=true for “job failed with available condition”Failure should flip Degraded to true alongside Available/Progressing updates.
operator: makeFakeOperatorInstance( withGenerations(1), + withTrueConditions(conditionDegraded), withFalseConditions(conditionAvailable), withFalseConditions(conditionProgressing)),
🧹 Nitpick comments (7)
pkg/tnf/pkg/jobs/jobcontroller.go (3)
140-158: Doc says “ignored” but code accumulates errors; align comment.Either truly ignore invalid values or update the comment to reflect that invalid conditions produce an error aggregated at build time. I’d update the comment for accuracy.
-// WithConditions sets the operational conditions under which the JobController will operate. -// Only 'Available', 'Progressing' and 'Degraded' are valid conditions; other values are ignored. +// WithConditions sets which Operator conditions this controller manages. +// Only 'Available', 'Progressing', and 'Degraded' are valid; other values are recorded as errors +// and returned by ToController() (NewJobController currently panics on such errors). func (c *JobController) WithConditions(conditions ...string) *JobController {
297-311: Gate error returns on Degraded being enabled to avoid noisy retries.Currently, failures return an error even if Degraded isn’t tracked, which can cause unnecessary error noise. Only return an error when Degraded is requested; otherwise, emit an event and continue.
- // return an error for reporting degraded status! - // setting a condition manually, similar to available and progressing, doesn't work - if isFailed(job) { - return fmt.Errorf("Job failed") - } + // Return an error only when Degraded is tracked; otherwise warn and continue. + if isFailed(job) { + if slices.Contains(c.conditions, opv1.OperatorStatusTypeDegraded) { + return fmt.Errorf("job failed") + } + syncContext.Recorder().Warningf("JobFailed", "Job %s/%s failed", job.Namespace, job.Name) + }
29-33: Document DefaultConditions as read-only
Add a note above the exported slice to warn callers not to mutate it and illustrate copying before use:// By default, we only set the progressing and degraded conditions for TNF setup. +// NOTE: DefaultConditions is read-only. Copy before modifying, e.g.: +// conds := append([]string(nil), DefaultConditions...) var DefaultConditions = []string{ opv1.OperatorStatusTypeProgressing, opv1.OperatorStatusTypeDegraded, }pkg/tnf/operator/starter_test.go (1)
201-209: Simplify setupConditions construction.Build from a copied base and append conditionally; removes the extra temp slice and avoids future aliasing surprises.
- conditionsWithAvailable := make([]string, len(jobs.DefaultConditions)) - copy(conditionsWithAvailable, jobs.DefaultConditions) - conditionsWithAvailable = append(conditionsWithAvailable, operatorv1.OperatorStatusTypeAvailable) - - setupConditions := jobs.DefaultConditions - if initialTransition { - setupConditions = conditionsWithAvailable - } + setupConditions := append([]string(nil), jobs.DefaultConditions...) + if initialTransition { + setupConditions = append(setupConditions, operatorv1.OperatorStatusTypeAvailable) + }pkg/tnf/operator/starter.go (1)
122-138: Streamline setupConditions construction.Minor simplification; copy base and append Available when initialTransition is true.
- // Make a copy of the default conditions and add the available condition - conditionsWithAvailable := make([]string, len(jobs.DefaultConditions)) - copy(conditionsWithAvailable, jobs.DefaultConditions) - conditionsWithAvailable = append(conditionsWithAvailable, operatorv1.OperatorStatusTypeAvailable) - - setupConditions := jobs.DefaultConditions - if initialTransition { - setupConditions = conditionsWithAvailable - } + setupConditions := append([]string(nil), jobs.DefaultConditions...) + if initialTransition { + setupConditions = append(setupConditions, operatorv1.OperatorStatusTypeAvailable) + }pkg/tnf/pkg/jobs/jobcontroller_test.go (2)
310-318: Remove unused helper findConditionIt’s not referenced; drop it to reduce noise.
-// Helper function to find a condition by type -func findCondition(conditions []opv1.OperatorCondition, conditionType string) *opv1.OperatorCondition { - for i := range conditions { - if conditions[i].Type == conditionType { - return &conditions[i] - } - } - return nil -}
378-381: Make error assertion resilient to message wordingUse substring match to avoid brittle failures on minor wording changes.
- expectedErrorMsg := "job spec was modified, old job is deleted" - if err.Error() != expectedErrorMsg { - t.Errorf("First sync returned unexpected error message: got %q, want %q", err.Error(), expectedErrorMsg) - } + expectedErrorMsg := "job spec was modified, old job is deleted" + if !strings.Contains(err.Error(), expectedErrorMsg) { + t.Errorf("First sync returned unexpected error message: got %q, want to contain %q", err.Error(), expectedErrorMsg) + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting
📒 Files selected for processing (4)
pkg/tnf/operator/starter.go(7 hunks)pkg/tnf/operator/starter_test.go(4 hunks)pkg/tnf/pkg/jobs/jobcontroller.go(3 hunks)pkg/tnf/pkg/jobs/jobcontroller_test.go(9 hunks)
🔇 Additional comments (14)
pkg/tnf/pkg/jobs/jobcontroller.go (2)
177-184: Good: Degraded is only wired when requested.Conditionally enabling WithSyncDegradedOnError based on presence of Degraded is the right coupling.
69-103: Verify valid condition values at all NewJobController call sites
- pkg/tnf/pkg/jobs/jobcontroller_test.go (lines 359, 489): confirm the
conditionsslice in both tests uses only"Available","Progressing", and"Degraded".- pkg/tnf/operator/starter.go (line 252): ensure the
conditionsargument passed intorunJobController(from your flag or config) is constrained to those three values.pkg/tnf/operator/starter_test.go (4)
43-56: LGTM: Stable fake informer.Mocked Run/HasSynced keeps tests deterministic without needing goroutines.
57-71: LGTM: Safe wrapper for fake client with mock informer.Keeps the test’s cache-sync expectations simple.
175-200: LGTM: Clear assertion of initialTransition semantics.Covers both externalEtcd modes and validates the expected Available inclusion.
245-250: LGTM: Tests use mock operator client in getArgs.Removes flakiness from informer startup in unit tests.
pkg/tnf/operator/starter.go (4)
65-74: Good: Informers started correctly and caches synced with the right informers.Non-blocking Run() and explicit sync of Infrastructures + operator client are correct.
83-88: LGTM: Clear separation of initialTransition detection.Encapsulating external-etcd detection in isCEOManagingEtcd makes intent obvious.
169-181: LGTM: isCEOManagingEtcd returns intuitive boolean and errors are wrapped.Straightforward and testable.
246-260: LGTM: Propagating conditions into JobController.The extra parameter cleanly scopes condition management per job type.
pkg/tnf/pkg/jobs/jobcontroller_test.go (4)
232-250: Confirm intent: Available=false while job is runningThis test expects Available to be explicitly set false when the job is running. Please confirm this matches the product requirement for TNF setup (and not just “leave Available untouched” when not managing it).
453-497: NewJobController wiring with conditions slice is cleanGood parameter threading; tests cleanly control managed condition set.
Also applies to: 366-369
94-106: Condition transition expectations look correctProgressing toggles appropriately for “running” and clears on “completed” while Available remains consistent with initial state.
Also applies to: 70-76
294-305: Status comparison via sanitize + DeepEqual is solidStripping transient fields and sorting ensures stable assertions.
20fd225 to
2a22f3f
Compare
|
/payload-job periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ovn-two-node-fencing-upgrade |
|
@jaypoulz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1bf3c670-8e89-11f0-852d-8c73c90ff9fb-0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
pkg/tnf/operator/starter.go (1)
257-265: Guard access to Command[1] to prevent panics on manifest drift.Indexing Command[1] will panic if job.yaml ever provides <2 args. Safer to check length and append if needed.
- job.Spec.Template.Spec.Containers[0].Command[1] = jobType.GetSubCommand() + cmd := job.Spec.Template.Spec.Containers[0].Command + if len(cmd) >= 2 { + cmd[1] = jobType.GetSubCommand() + } else { + // ensure at least one base command exists + if len(cmd) == 0 { + cmd = []string{"/usr/bin/tnf"} + } + cmd = append(cmd, jobType.GetSubCommand()) + } + job.Spec.Template.Spec.Containers[0].Command = cmdpkg/tnf/pkg/jobs/jobcontroller_test.go (2)
760-778: Bug: in-place slice deletion inside range can skip elements and/or panic.Removing elements from status.Conditions while ranging forward over the same slice is unsafe. Filter instead (or iterate backward).
- // Remove conditions that are not in the testedConditions array - for i := range status.Conditions { - if !slices.Contains(testedConditions, status.Conditions[i].Type) { - status.Conditions = append(status.Conditions[:i], status.Conditions[i+1:]...) - } - } + // Keep only testedConditions; preserve prior sort order + filtered := status.Conditions[:0] + for _, c := range status.Conditions { + if slices.Contains(testedConditions, c.Type) { + filtered = append(filtered, c) + } + } + status.Conditions = filtered
172-196: “job not yet created” case never invokes Sync
The test’sinitialObjects.job == nilso theif test.initialObjects.job != nilguard skipsSync, making the case a no-op. Either remove that guard (callcontroller.Sync(...)unconditionally and assert no Job is created) or rename the case to “no-op without Sync”.
🧹 Nitpick comments (4)
pkg/tnf/operator/starter.go (1)
83-88: Optional: log initialTransition decision.Add a single Info/V(2) with the externalEtcd mode and chosen setup conditions to ease debugging.
pkg/tnf/pkg/jobs/jobcontroller.go (1)
29-34: Avoid exposing mutable slices; return copies or document immutability.DefaultConditions/AllConditions are exported and mutable. Consider exposing helper funcs that return copies (or document “do not mutate”) to avoid accidental in-place changes.
-var DefaultConditions = []string{opv1.OperatorStatusTypeProgressing, opv1.OperatorStatusTypeDegraded} -var AllConditions = []string{opv1.OperatorStatusTypeAvailable, opv1.OperatorStatusTypeProgressing, opv1.OperatorStatusTypeDegraded} +// DefaultConditions returns a fresh copy of the default tracked conditions. +func DefaultConditions() []string { + return []string{opv1.OperatorStatusTypeProgressing, opv1.OperatorStatusTypeDegraded} +} +// AllConditions returns a fresh copy of all supported conditions. +func AllConditions() []string { + return []string{opv1.OperatorStatusTypeAvailable, opv1.OperatorStatusTypeProgressing, opv1.OperatorStatusTypeDegraded} +}And update call sites accordingly.
pkg/tnf/pkg/jobs/jobcontroller_test.go (2)
48-51: Use the official constant for Degraded for consistency.Prefer controllerName + opv1.OperatorStatusTypeDegraded over a raw "Degraded" literal.
- conditionDegraded = controllerName + "Degraded" + conditionDegraded = controllerName + opv1.OperatorStatusTypeDegraded
73-81: Avoid adding an empty Job object to the fake client/informer.Creating & indexing a zero-valued Job (no name/namespace) can lead to odd informer keys. Prefer nil (and call Sync) or a properly constructed Job via makeJob().
- initialObjects: testObjects{ - job: &batchv1.Job{}, + initialObjects: testObjects{ + // Leave job nil and still call Sync; or use makeJob() if you need a pre-existing oneAnd call Sync unconditionally.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting
📒 Files selected for processing (4)
pkg/tnf/operator/starter.go(7 hunks)pkg/tnf/operator/starter_test.go(4 hunks)pkg/tnf/pkg/jobs/jobcontroller.go(3 hunks)pkg/tnf/pkg/jobs/jobcontroller_test.go(15 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/tnf/operator/starter_test.go
🔇 Additional comments (5)
pkg/tnf/operator/starter.go (3)
65-74: Starting informers + syncing caches looks correct.Good: Start() the config informers, run the operator informer in a goroutine, and wait on Infrastructures + operator client. This fixes the earlier blocking/run issues.
126-133: Good: avoids slice aliasing by not appending to DefaultConditions.Selecting between DefaultConditions and AllConditions without mutation prevents shared-backing-array surprises.
95-139: Informer startup verified; no action required.controlPlaneNodeInformeris already started viago controlPlaneNodeInformer.Run(ctx.Done())inpkg/operator/starter.go(line 624).Likely an incorrect or invalid review comment.
pkg/tnf/pkg/jobs/jobcontroller.go (2)
70-83: New conditions parameter plumbs through cleanly.The builder correctly validates/dedupes, and Degraded wiring depends on inclusion via WithSyncDegradedOnError. Looks good.
Also applies to: 92-93
248-271: Condition-setting logic aligns with spec.Available/Progressing statuses map correctly to job phases; failures surface via returned error (Degraded) only when tracked.
Also applies to: 273-296, 307-313
2a22f3f to
7a524fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/tnf/operator/starter.go (1)
52-63: Update all callers to include missingdynamicClientparameter.
The signature now requires 11 arguments, but calls inpkg/operator/starter.go:619andpkg/tnf/operator/starter_test.go:92only pass 10. Add thedynamicClientas the final argument to match the updated function definition.
♻️ Duplicate comments (5)
pkg/tnf/pkg/etcd/etcd.go (3)
24-33: Use UpdateStaticPodStatus/UpdateStaticPodConditionFn for StaticPodOperatorClient.The current code uses
UpdateStatusandUpdateConditionFn, which are for non-staticOperatorClient. ForStaticPodOperatorClient, you must useUpdateStaticPodStatusandUpdateStaticPodConditionFnto match the interface signature.Apply this diff:
- _, _, err := v1helpers.UpdateStatus(ctx, operatorClient, v1helpers.UpdateConditionFn(operatorv1.OperatorCondition{ + _, _, err := v1helpers.UpdateStaticPodStatus(ctx, operatorClient, v1helpers.UpdateStaticPodConditionFn(operatorv1.OperatorCondition{ Type: OperatorConditionExternalEtcdReadyForTransition, Status: operatorv1.ConditionTrue, Reason: "PacemakerConfiguredForEtcdTransition", Message: "pacemaker's resource agent is ready to takeover the etcd container", - })) + }))
46-72: Add robustness checks: revision bump and non-empty NodeStatuses.The current implementation can return success prematurely in two scenarios:
- When
NodeStatusesis empty (the loop never executes,removedstaystrue)- When
CurrentRevisionalready equalsLatestAvailableRevisionbefore CEO creates a new revisionThese conditions can cause the wait to succeed without CEO actually removing the container.
Apply this diff to add baseline revision tracking and validate NodeStatuses:
func waitForStaticContainerRemoved(ctx context.Context, operatorClient v1helpers.StaticPodOperatorClient) error { klog.Info("Wait for static etcd removed") + // Capture baseline to ensure CEO actually created a new revision + initialLatest := int32(0) + if _, s, _, err := operatorClient.GetStaticPodOperatorState(); err == nil { + initialLatest = s.LatestAvailableRevision + } else { + klog.Warningf("Could not read initial LatestAvailableRevision: %v", err) + } + isRemoved := func(context.Context) (done bool, err error) { _, status, _, err := operatorClient.GetStaticPodOperatorState() if err != nil { klog.Error(err, "Failed to get Etcd, but will ignore error for now...") return false, nil } + if len(status.NodeStatuses) == 0 { + klog.Info("static etcd not removed yet: no node statuses reported") + return false, nil + } + if status.LatestAvailableRevision <= initialLatest { + klog.Infof("static etcd not removed yet: waiting for new revision > %d (current=%d)", initialLatest, status.LatestAvailableRevision) + return false, nil + } + removed := true for _, nodeStatus := range status.NodeStatuses { if nodeStatus.CurrentRevision == status.LatestAvailableRevision { klog.Infof("static etcd removed: node %s, current rev %v, latest rev %v", nodeStatus.NodeName, nodeStatus.CurrentRevision, status.LatestAvailableRevision) } else { klog.Infof("static etcd not removed yet: node %s, current rev %v, latest rev %v", nodeStatus.NodeName, nodeStatus.CurrentRevision, status.LatestAvailableRevision) removed = false } } return removed, nil }Also update tests to simulate a revision bump in the fake StaticPodOperatorClient.
74-85: Use UpdateStaticPodStatus/UpdateStaticPodConditionFn for transition completion update.Same issue as lines 24-33: must use
UpdateStaticPodStatusandUpdateStaticPodConditionFnforStaticPodOperatorClient.Apply this diff:
- _, _, err = v1helpers.UpdateStatus(ctx, operatorClient, v1helpers.UpdateConditionFn(operatorv1.OperatorCondition{ + _, _, err = v1helpers.UpdateStaticPodStatus(ctx, operatorClient, v1helpers.UpdateStaticPodConditionFn(operatorv1.OperatorCondition{ Type: OperatorConditionExternalEtcdHasCompletedTransition, Status: operatorv1.ConditionTrue, Reason: "PacemakerResourceAgentIsNowRunningEtcd", Message: "pacemaker's resource agent is now running the etcd container", - })) + }))pkg/tnf/pkg/jobs/jobcontroller_test.go (1)
305-327: Add Degraded=true expectation for job failure case.The "job failed with available condition" test case expects
Available=falsebut does not expectDegraded=true. When a job fails, the operator should report a degraded state.Apply this diff:
operator: makeFakeOperatorInstance( withGenerations(1), + withTrueConditions(conditionDegraded), withFalseConditions(conditionAvailable), ),pkg/tnf/operator/starter.go (1)
11-12: Critical: Duplicate import still present (compile blocker).Lines 11-12 import the same package path
github.com/openshift/client-go/config/informers/externalversions/config/v1under two different aliases (configv1informersandv1). Go prohibits this and the code will not compile.Apply this diff to remove the duplicate:
import ( "bytes" "context" "fmt" "os" "sync" operatorv1 "github.com/openshift/api/operator/v1" configv1informers "github.com/openshift/client-go/config/informers/externalversions/config/v1" - v1 "github.com/openshift/client-go/config/informers/externalversions/config/v1" operatorv1informers "github.com/openshift/client-go/operator/informers/externalversions/operator/v1"Then update line 59 and 219 to use
configv1informers.NetworkInformerinstead ofv1.NetworkInformer.
🧹 Nitpick comments (11)
pkg/operator/starter.go (2)
452-458: Wiring infra informer into EtcdMembersController looks correct.Ctor call matches new signature and uses Config().V1().Infrastructures(). No functional issues spotted. Consider confirming RBAC covers list/watch on infrastructures for this controller.
618-631: HandleDualReplicaClusters: good informer choice; add a brief sync guard.Passing the infrastructures informer aligns with the new API. Since configInformers are started earlier, this is fine. For robustness, consider waiting for infrastructures informer sync before the handler dereferences listers, or have the handler do it internally.
Would you like me to add a WaitForCacheSync check here or verify the handler already does so?pkg/testutils/testutils.go (1)
244-248: Avoid duplicate conditions; use SetOperatorCondition in a loop.Appending raw conditions can create duplicates and inconsistent state. Prefer SetOperatorCondition per entry.
-func WithConditions(conditions ...operatorv1.OperatorCondition) func(*operatorv1.StaticPodOperatorStatus) { - return func(status *operatorv1.StaticPodOperatorStatus) { - status.Conditions = append(status.Conditions, conditions...) - } -} +func WithConditions(conds ...operatorv1.OperatorCondition) func(*operatorv1.StaticPodOperatorStatus) { + return func(status *operatorv1.StaticPodOperatorStatus) { + for _, c := range conds { + v1helpers.SetOperatorCondition(&status.Conditions, c) + } + } +}pkg/tnf/pkg/jobs/jobcontroller.go (3)
29-34: Protect exported condition sets from accidental mutation.As slices, DefaultConditions/AllConditions can be mutated by callers. Return copies to avoid aliasing footguns.
-var DefaultConditions = []string{opv1.OperatorStatusTypeProgressing, opv1.OperatorStatusTypeDegraded} -var AllConditions = []string{opv1.OperatorStatusTypeAvailable, opv1.OperatorStatusTypeProgressing, opv1.OperatorStatusTypeDegraded} +var ( + defaultConditions = []string{opv1.OperatorStatusTypeProgressing, opv1.OperatorStatusTypeDegraded} + allConditions = []string{opv1.OperatorStatusTypeAvailable, opv1.OperatorStatusTypeProgressing, opv1.OperatorStatusTypeDegraded} +) + +func DefaultConditions() []string { return append([]string(nil), defaultConditions...) } +func AllConditions() []string { return append([]string(nil), allConditions...) }Follow-up: adjust call sites to use DefaultConditions() / AllConditions().
141-161: WithConditions validation is solid; minor UX tweak.Validation/uniqueness works. Consider aggregating unknowns into a single error to make misconfigurations clearer. Optional.
307-313: Error on job failure to drive Degraded: include job name for clarity.Returning an error is right to mark Degraded when opted-in. Include job namespace/name to aid debugging.
- if isFailed(job) { - return fmt.Errorf("Job failed") - } + if isFailed(job) { + return fmt.Errorf("job %s/%s failed", job.Namespace, job.Name) + }pkg/tnf/operator/starter_test.go (2)
56-57: initErr plumbing is unused; simplify or set it.initErr is always nil and only compared in tests. Either remove it and the related checks, or set it from the handler init path to make the test meaningful.
235-247: Check AddIndexers return and fail fast on error.Guard against indexer registration issues.
- etcdInformers.Operator().V1().Etcds().Informer().AddIndexers(etcdIndexer.GetIndexers()) + if err := etcdInformers.Operator().V1().Etcds().Informer().AddIndexers(etcdIndexer.GetIndexers()); err != nil { + t.Fatalf("failed to add indexers: %v", err) + }pkg/operator/etcdmemberscontroller/etcdmemberscontroller.go (1)
7-9: Infra informer/lister wiring: correct and complete.Constructor stores the lister and includes the informer in watches. Looks good. If the lister isn’t used in this controller yet, consider dropping the field to avoid confusion.
Also applies to: 26-29, 31-42, 47-51
pkg/tnf/operator/starter.go (2)
65-77: Good: Explicit cache sync before informer usage.The explicit
WaitForCacheSyncbefore using the Infrastructure informer is correct sinceHandleDualReplicaClustersisn't a controller with automatic sync guarantees. The early-return pattern for non-external-etcd clusters is appropriate.Consider clarifying the error message at line 72:
- klog.Errorf("failed to check if external etcd cluster is enabled: %v", err) + klog.Errorf("failed to check if cluster uses external etcd topology: %v", err)
194-211: Consider adding retry logic for bootstrap checkIn
pkg/tnf/operator/starter.go,handleNodes(around lines 160–164) logs and returns onwaitForEtcdBootstrapCompletederrors under async.Once, so TNF job controllers never start until the operator restarts. Implement a retry loop (with backoff or periodic requeue) around the bootstrap check to avoid relying solely on operator restarts.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (17)
pkg/operator/bootstrapteardown/bootstrap_teardown_controller_test.go(1 hunks)pkg/operator/ceohelpers/common_test.go(2 hunks)pkg/operator/ceohelpers/external_etcd_status.go(1 hunks)pkg/operator/ceohelpers/external_etcd_status_test.go(1 hunks)pkg/operator/clustermemberremovalcontroller/clustermemberremovalcontroller_test.go(1 hunks)pkg/operator/etcdmemberscontroller/etcdmemberscontroller.go(4 hunks)pkg/operator/etcdmemberscontroller/etcdmemberscontroller_test.go(1 hunks)pkg/operator/externaletcdsupportcontroller/external_etcd_support_controller_test.go(5 hunks)pkg/operator/periodicbackupcontroller/periodicbackupcontroller_test.go(4 hunks)pkg/operator/starter.go(2 hunks)pkg/operator/targetconfigcontroller/targetconfigcontroller_test.go(8 hunks)pkg/testutils/testutils.go(4 hunks)pkg/tnf/operator/starter.go(8 hunks)pkg/tnf/operator/starter_test.go(6 hunks)pkg/tnf/pkg/etcd/etcd.go(1 hunks)pkg/tnf/pkg/jobs/jobcontroller.go(3 hunks)pkg/tnf/pkg/jobs/jobcontroller_test.go(15 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- pkg/operator/ceohelpers/common_test.go
- pkg/operator/periodicbackupcontroller/periodicbackupcontroller_test.go
- pkg/operator/clustermemberremovalcontroller/clustermemberremovalcontroller_test.go
- pkg/operator/bootstrapteardown/bootstrap_teardown_controller_test.go
- pkg/operator/etcdmemberscontroller/etcdmemberscontroller_test.go
🔇 Additional comments (42)
pkg/operator/targetconfigcontroller/targetconfigcontroller_test.go (2)
107-148: Operator condition constants match implementation and test progression is thorough. Test cases correctly useEtcdRunningInClusterandExternalEtcdReadyForTransition(defined in pkg/tnf/pkg/etcd/etcd.go) to validate the ExternalEtcd lifecycle and container removal.
214-269: Approve getController topology integration
FakeInfrastructureLister in pkg/testutils/testutils.go (lines 563–571) correctly constructs a topology-based InfrastructureLister; no further changes required.pkg/testutils/testutils.go (3)
224-232: Nice: leverage v1helpers.SetOperatorCondition for single-condition updates.This keeps last-writer-wins semantics consistent with operators. LGTM.
562-571: FakeInfrastructureLister helper: good, minimal, and deterministic.Indexer + single object is fine for tests. LGTM.
573-587: FakeStaticPodOperatorClient: handy helper.Initialization is correct; returns a mutable status pointer as expected for tests. LGTM.
pkg/tnf/pkg/jobs/jobcontroller.go (4)
80-81: API change: passing conditions is a good improvement.Signature and builder wiring are consistent. Please note that if Degraded isn’t included, job failures still return an error but won’t set a Degraded condition. Document this at call sites to avoid surprises.
Also applies to: 92-93
178-185: Degraded-on-error only when requested.Using WithSyncDegradedOnError conditionally is correct. Good separation of behavior.
248-271: Available condition logic is correct and self-explanatory.States map cleanly to Complete/Failed/Running. LGTM.
273-296: Progressing condition logic is correct.True only while running; false for complete/failed. LGTM.
pkg/tnf/operator/starter_test.go (1)
116-179: TestSetupJobConditionsBasedOnExternalEtcd: scenario coverage looks good.Asserts Available is included only before external etcd transition completes. Concise and clear.
pkg/operator/etcdmemberscontroller/etcdmemberscontroller.go (2)
140-149: Honor external etcd completion to avoid false “no quorum” signals.This aligns with the automatic quorum recovery expectation for TNF clusters. LGTM.
160-173: Fallback path and commentary are clear.The comments explain why this branch should not trigger for automatic recovery clusters. Implementation is straightforward.
pkg/operator/externaletcdsupportcontroller/external_etcd_support_controller_test.go (3)
9-10: LGTM!Import additions correctly support the topology-driven test infrastructure and align with the new status-driven model introduced in the PR.
Also applies to: 23-26
32-76: LGTM!Test scenarios correctly model the three external etcd states: non-external cluster, external without bootstrap, and external with bootstrap completed. The topology-driven approach aligns with the infrastructure-based status model.
108-170: LGTM!Controller setup correctly wires the infrastructure lister with topology-driven configuration. The test harness properly initializes the fake infrastructure with the appropriate ControlPlaneTopology setting.
pkg/operator/ceohelpers/external_etcd_status_test.go (6)
16-59: LGTM!Test correctly validates that only DualReplicaTopologyMode indicates an external etcd cluster. Coverage of all three topology modes is comprehensive.
61-125: LGTM!Test thoroughly validates the ExternalEtcdReadyForTransition condition checking logic across multiple scenarios, including edge cases where other conditions are present.
127-180: LGTM!Test correctly validates the EtcdRunningInCluster condition checking across different states. The test structure is consistent and comprehensive.
182-256: LGTM!Test comprehensively validates the transition completion detection logic, including edge cases where multiple conditions are present but the specific completion condition is not set.
258-390: LGTM!Test comprehensively validates the composite status function across all lifecycle stages. The "full lifecycle" case correctly models the end-state where only the transition completion condition remains after cleanup.
392-523: LGTM!Excellent integration-style test that validates the complete external etcd transition lifecycle through four distinct states. The negative test at the end provides valuable validation that transition completion is not prematurely reported.
pkg/tnf/pkg/etcd/etcd.go (2)
14-18: LGTM!Operator condition constants are well-defined and provide clear semantic names for the external etcd lifecycle states.
20-21: LGTM!Function signature correctly updated to use StaticPodOperatorClient, aligning with the new status-driven transition model.
pkg/operator/ceohelpers/external_etcd_status.go (6)
13-18: LGTM!The
ExternalEtcdClusterStatusstruct provides a clean, well-documented API surface for tracking external etcd lifecycle states. The four boolean fields cover all necessary transition points.
20-35: LGTM!Function correctly determines external etcd cluster status by checking for DualReplicaTopology. Error handling and logging are appropriate.
37-53: LGTM!Function correctly checks for the ExternalEtcdReadyForTransition condition using the standard library helper. Implementation follows established patterns.
55-70: LGTM!Function correctly checks for the EtcdRunningInCluster condition to determine bootstrap completion status. Implementation is consistent with other status check functions.
72-87: LGTM!Function correctly checks for transition completion. Implementation is consistent with the established pattern across all status check functions.
89-140: LGTM!Composite status function is well-implemented with efficient state fetching (single call to
GetStaticPodOperatorState) and appropriate early-return optimization for non-external clusters. The default initialization ensures safe zero-value behavior.pkg/tnf/pkg/jobs/jobcontroller_test.go (7)
45-51: LGTM!Test constants are well-defined and clearly support the condition-driven test scenarios. The separation between
allConditionsandavailableConditionallows for targeted test validation.
144-195: LGTM! Degraded condition handling correctly implemented.The "job failed" test case now correctly expects
Degraded=true(line 164) and simulates theWithSyncDegradedOnErrorbehavior (lines 185-195) when a sync error occurs. This addresses the previous review feedback about missing degraded state assertions.
497-551: LGTM!The
newTestContextsignature correctly updated to accept aconditionsparameter, which is properly passed through toNewJobController. This aligns with the condition-driven test infrastructure.
364-463: LGTM!Test correctly validates job modification detection and recreation flow. The use of
DefaultConditions(line 410) aligns with the expected behavior for this test scenario.
760-780: LGTM!The
sanitizeInstanceStatusfunction correctly filters conditions to the tested set while preserving sort order. The reslicing pattern (filtered := status.Conditions[:0]) is safe and efficient.
196-239: LGTM!Test assertions are comprehensive, covering error handling, job state, and operator status. The sanitization logic ensures consistent comparisons across test cases.
5-5: No action needed: Go 1.21+ configuredgo.mod declares Go 1.24.0 and CI uses Golang 1.24, meeting the requirement for the slices package.
pkg/tnf/operator/starter.go (6)
29-29: LGTM: New imports support bootstrap wait logic.The
restandbootstrapteardownimports are properly used in the newwaitForEtcdBootstrapCompletedhelper function.Also applies to: 35-35
79-96: LGTM: Event handler refactored with sync.Once guard.The refactoring to use
handleAddedNodeimproves testability and readability. Thesync.Onceguard correctly ensures job controllers are created only once, even when both nodes are present on the firstAddFuncinvocation.
121-157: LGTM: Well-structured event handler with proper guards.The implementation correctly:
- Validates type assertions with early returns
- Checks for exactly 2 control plane nodes before proceeding
- Uses
sync.Onceto prevent duplicate job controller creation- Launches
handleNodesin a goroutine to avoid blocking the event handler during bootstrap wait
213-240: LGTM: Infrastructure informer threading is consistent.The signature change to accept
infrastructureInformerand pass it toNewExternalEtcdEnablerControlleris consistent with the overall refactoring in this PR.
261-289: LGTM: Conditions parameter enables dynamic condition sets.The new
conditionsparameter allows callers to specify which operator status conditions each job should monitor, supporting the PR's objective to conditionally set the Available condition based on external etcd transition status.
159-192: LGTM: Independent slice declarations ensure no aliasing between DefaultConditions and AllConditions.
| configv1informers "github.com/openshift/client-go/config/informers/externalversions/config/v1" | ||
| v1 "github.com/openshift/client-go/config/informers/externalversions/config/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build-breaker: duplicate import of the same package with two aliases.
You import config/v1 informers twice with different aliases; Go forbids duplicate import paths. Use a single alias (e.g., configv1informers) and replace v1 usages.
- configv1informers "github.com/openshift/client-go/config/informers/externalversions/config/v1"
- v1 "github.com/openshift/client-go/config/informers/externalversions/config/v1"
+ configv1informers "github.com/openshift/client-go/config/informers/externalversions/config/v1"And update references:
- networkInformer v1.NetworkInformer
+ networkInformer configv1informers.NetworkInformer
…
- infrastructureInformer: configInformers.Config().V1().Infrastructures(),
+ infrastructureInformer: configInformers.Config().V1().Infrastructures(),
…
- networkInformer: networkInformer,
+ networkInformer: networkInformer,(Replace any remaining v1. identifiers with configv1informers.)
🤖 Prompt for AI Agents
In pkg/tnf/operator/starter_test.go around lines 16 to 17 there is a duplicate
import of the same package with two aliases; remove the redundant import (keep a
single alias, e.g. configv1informers), delete the other alias, and update all
code references that use the removed alias (replace any v1.xxx identifiers with
configv1informers.xxx); ensure imports are formatted (goimports/gofmt) and tests
build.
77e2932 to
7b77c22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/tnf/pkg/jobs/jobcontroller.go (1)
73-83: Add missingconditions []stringargument to allNewJobControllercalls
- pkg/tnf/operator/starter.go (around line 267): include a
conditionsslice in the call tojobs.NewJobController- pkg/tnf/pkg/jobs/jobcontroller_test.go (around lines 403 and 533): update each
NewJobController(…)invocation with the newconditions []stringparameter
♻️ Duplicate comments (3)
pkg/tnf/pkg/etcd/etcd.go (1)
49-70: Wait for a new revision before declaring static etcd removedAs soon as we enter this poll loop, every steady cluster already has
CurrentRevision == LatestAvailableRevision, soremovedstaystrueon the very first iteration and we exit without CEO ever rolling a revision that drops the etcd container. We need to remember the baseline revision, require a strictly higher revision before succeeding, and bail out early when no node statuses have reported yet. Otherwise the job signals completion immediately and skips the actual transition.func waitForStaticContainerRemoved(ctx context.Context, operatorClient v1helpers.StaticPodOperatorClient) error { klog.Info("Wait for static etcd removed") + initialLatest := int32(0) + if _, status, _, err := operatorClient.GetStaticPodOperatorState(); err == nil { + initialLatest = status.LatestAvailableRevision + } else { + klog.Warningf("Could not read initial LatestAvailableRevision: %v", err) + } + isRemoved := func(context.Context) (done bool, err error) { _, status, _, err := operatorClient.GetStaticPodOperatorState() if err != nil { klog.Error(err, "Failed to get Etcd, but will ignore error for now...") return false, nil } - removed := true + if len(status.NodeStatuses) == 0 { + klog.Info("static etcd not removed yet: no node statuses reported") + return false, nil + } + if status.LatestAvailableRevision <= initialLatest { + klog.Infof("static etcd not removed yet: waiting for new revision > %d (current=%d)", initialLatest, status.LatestAvailableRevision) + return false, nil + } + + removed := true for _, nodeStatus := range status.NodeStatuses { if nodeStatus.CurrentRevision == status.LatestAvailableRevision { klog.Infof("static etcd removed: node %s, current rev %v, latest rev %v", nodeStatus.NodeName, nodeStatus.CurrentRevision, status.LatestAvailableRevision) } else { klog.Infof("static etcd not removed yet: node %s, current rev %v, latest rev %v", nodeStatus.NodeName, nodeStatus.CurrentRevision, status.LatestAvailableRevision)pkg/tnf/operator/starter.go (1)
11-14: Duplicate import alias for config informers (compile blocker).Same package imported twice under different aliases; Go forbids this. Use a single alias and update types accordingly.
@@ - configv1informers "github.com/openshift/client-go/config/informers/externalversions/config/v1" - v1 "github.com/openshift/client-go/config/informers/externalversions/config/v1" + configv1informers "github.com/openshift/client-go/config/informers/externalversions/config/v1" @@ - networkInformer v1.NetworkInformer, + networkInformer configv1informers.NetworkInformer, @@ - networkInformer v1.NetworkInformer, + networkInformer configv1informers.NetworkInformer,Also applies to: 59-60, 219-220
pkg/tnf/operator/starter_test.go (1)
16-17: Critical: Duplicate import still present despite previous fix claim.The same package is imported twice with different aliases (
configv1informersandv1). This is a compile error in Go—you cannot import the same package path with two different names.Apply this diff to remove the duplicate:
configv1informers "github.com/openshift/client-go/config/informers/externalversions/config/v1" - v1 "github.com/openshift/client-go/config/informers/externalversions/config/v1" operatorversionedclientfake "github.com/openshift/client-go/operator/clientset/versioned/fake"Then update the usage on line 51:
- networkInformer v1.NetworkInformer + networkInformer configv1informers.NetworkInformerNote: The past review comment claimed this was addressed in commit fb3ee86, but the duplicate import remains in the current code.
🧹 Nitpick comments (13)
pkg/operator/externaletcdsupportcontroller/external_etcd_support_controller_test.go (8)
34-39: DropexpectedErr(always nil) and assert directlySimplify the table by removing
expectedErrand usingrequire.NoError(t, err). Less noise, clearer intent.- expectedErr error ... - require.Equal(t, scenario.expectedErr, err) + require.NoError(t, err)
53-64: Make “bootstrap not completed” explicitConsider setting an explicit negative operator condition (e.g., transition not completed) so this case doesn’t rely on implicit absence; improves future-proofing if defaults change.
79-84: Avoid loop variable capture pitfallsEven without t.Parallel, prefer copying the range var inside the loop to future-proof.
- for _, scenario := range scenarios { + for _, scenario := range scenarios { + scenario := scenario t.Run(scenario.name, func(t *testing.T) {
89-89: Avoid magic string for CM nameExtract "external-etcd-pod" to a shared const (or reference the controller’s const) to prevent drift.
- etcdPodCM, err := fakeKubeClient.CoreV1().ConfigMaps(operatorclient.TargetNamespace).Get(context.TODO(), "external-etcd-pod", metav1.GetOptions{}) + const externalEtcdPodCMName = "external-etcd-pod" + etcdPodCM, err := fakeKubeClient.CoreV1().ConfigMaps(operatorclient.TargetNamespace).Get(context.TODO(), externalEtcdPodCMName, metav1.GetOptions{})
90-96: Assert NotFound specificallyTighten the negative assertion to ensure we’re failing only on “not found”.
+ import apierrors "k8s.io/apimachinery/pkg/api/errors" ... - if !scenario.expectedConfigMapExists { - require.Error(t, err) + if !scenario.expectedConfigMapExists { + require.True(t, apierrors.IsNotFound(err), "expected NotFound, got: %v", err) return }
101-101: Userequire.Lenfor clarityMinor: reads better and prints nicer diffs on failure.
- require.Equal(t, 1, len(pod.Spec.Containers)) + require.Len(t, pod.Spec.Containers, 1)
108-110: Mark helperAdd
t.Helper()at the top ofgetControllerso failures attribute to the caller.func getController( t *testing.T, staticPodStatus *operatorv1.StaticPodOperatorStatus, topology configv1.TopologyMode) (events.Recorder, v1helpers.StaticPodOperatorClient, *ExternalEtcdEnablerController, *fake.Clientset) { + t.Helper()
161-168: Consider a tiny helper to build listersThe ad-hoc indexer + lister wiring is fine; a local helper (or testutils helper) could cut boilerplate and standardize across tests.
pkg/operator/ceohelpers/external_etcd_status.go (1)
58-70: DRY the repeated operator-state reads.Three helpers and GetExternalEtcdClusterStatus all fetch opStatus and read conditions. Consider a small internal helper that returns opStatus (with nil guard) to reduce duplication and keep logging/metrics in one place.
Also applies to: 75-87, 113-128
pkg/operator/etcdmemberscontroller/etcdmemberscontroller.go (2)
149-158: Reason/message should reflect external-etcd path.When availability is granted via completed external-etcd transition, “EtcdQuorate” can be misleading. Set a distinct Reason/Message in that branch.
- _, _, updateErr := v1helpers.UpdateStatus(ctx, c.operatorClient, v1helpers.UpdateConditionFn(operatorv1.OperatorCondition{ - Type: "EtcdMembersAvailable", - Status: operatorv1.ConditionTrue, - Reason: "EtcdQuorate", - Message: memberHealth.Status(), - })) + reason := "EtcdQuorate" + msg := memberHealth.Status() + if hasExternalEtcdCompletedTransition { + reason = "ExternalEtcdActive" + msg = "External etcd transition completed; availability derived from automatic quorum recovery" + } + _, _, updateErr := v1helpers.UpdateStatus(ctx, c.operatorClient, v1helpers.UpdateConditionFn(operatorv1.OperatorCondition{ + Type: "EtcdMembersAvailable", + Status: operatorv1.ConditionTrue, + Reason: reason, + Message: msg, + }))
80-87: Consider short‑circuiting on external-etcd completion before member health.If external-etcd transition is complete, membership health may be irrelevant; today a MemberHealth error returns early and never marks Available. Evaluating the transition first (and short‑circuiting) would make availability resilient even if MemberHealth times out.
Also applies to: 140-148
pkg/tnf/operator/starter.go (1)
261-275: Defensive copy of conditions slice before passing to controller.Avoid accidental mutation of shared slices inside downstream code.
func runJobController(ctx context.Context, jobType tools.JobType, nodeName *string, controllerContext *controllercmd.ControllerContext, operatorClient v1helpers.StaticPodOperatorClient, kubeClient kubernetes.Interface, kubeInformersForNamespaces v1helpers.KubeInformersForNamespaces, conditions []string) { @@ - tnfJobController := jobs.NewJobController( + conds := append([]string(nil), conditions...) + tnfJobController := jobs.NewJobController( @@ - conditions, + conds,pkg/tnf/operator/starter_test.go (1)
85-90: Consider simplifying the initialization error check.The current condition works but can be hard to parse at a glance.
Apply this diff for clarity:
- if tt.wantHandlerInitErr && tt.args.initErr == nil || !tt.wantHandlerInitErr && tt.args.initErr != nil { + if (tt.wantHandlerInitErr && tt.args.initErr == nil) || (!tt.wantHandlerInitErr && tt.args.initErr != nil) { t.Errorf("NewDualReplicaClusterHandler handlerInitErr = %v, wantHandlerInitErr %v", tt.args.initErr, tt.wantHandlerInitErr) }Or use XOR for even more clarity:
- if tt.wantHandlerInitErr && tt.args.initErr == nil || !tt.wantHandlerInitErr && tt.args.initErr != nil { + if tt.wantHandlerInitErr != (tt.args.initErr != nil) { t.Errorf("NewDualReplicaClusterHandler handlerInitErr = %v, wantHandlerInitErr %v", tt.args.initErr, tt.wantHandlerInitErr) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (17)
pkg/operator/bootstrapteardown/bootstrap_teardown_controller_test.go(1 hunks)pkg/operator/ceohelpers/common_test.go(2 hunks)pkg/operator/ceohelpers/external_etcd_status.go(1 hunks)pkg/operator/ceohelpers/external_etcd_status_test.go(1 hunks)pkg/operator/clustermemberremovalcontroller/clustermemberremovalcontroller_test.go(1 hunks)pkg/operator/etcdmemberscontroller/etcdmemberscontroller.go(4 hunks)pkg/operator/etcdmemberscontroller/etcdmemberscontroller_test.go(1 hunks)pkg/operator/externaletcdsupportcontroller/external_etcd_support_controller_test.go(5 hunks)pkg/operator/periodicbackupcontroller/periodicbackupcontroller_test.go(4 hunks)pkg/operator/starter.go(2 hunks)pkg/operator/targetconfigcontroller/targetconfigcontroller_test.go(8 hunks)pkg/testutils/testutils.go(4 hunks)pkg/tnf/operator/starter.go(8 hunks)pkg/tnf/operator/starter_test.go(6 hunks)pkg/tnf/pkg/etcd/etcd.go(1 hunks)pkg/tnf/pkg/jobs/jobcontroller.go(3 hunks)pkg/tnf/pkg/jobs/jobcontroller_test.go(15 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- pkg/operator/bootstrapteardown/bootstrap_teardown_controller_test.go
- pkg/operator/ceohelpers/common_test.go
- pkg/testutils/testutils.go
- pkg/operator/periodicbackupcontroller/periodicbackupcontroller_test.go
- pkg/operator/clustermemberremovalcontroller/clustermemberremovalcontroller_test.go
- pkg/operator/starter.go
- pkg/operator/etcdmemberscontroller/etcdmemberscontroller_test.go
🔇 Additional comments (15)
pkg/tnf/pkg/jobs/jobcontroller.go (1)
29-33: LGTM! Clear separation of default vs. full condition sets.The doc comment clearly explains why
DefaultConditionsexcludesAvailableby default—the operator is only unavailable during etcd ownership transition. The separation intoDefaultConditionsandAllConditionsprovides flexibility for different use cases.pkg/tnf/pkg/jobs/jobcontroller_test.go (7)
48-50: LGTM! Clear condition constant definitions.The addition of
conditionDegraded,allConditions, andavailableConditionprovides clear test configuration for different condition scenarios.
55-71: Good addition: negative test case for no-sync scenario.This test verifies that when no job exists (
job: nil) and no sync is triggered, conditions remain unchanged. This validates the controller doesn't modify conditions unnecessarily.
258-362: Good coverage: dedicated test for Available condition lifecycle.
TestSyncWithAvailableConditionexercises the Available condition across Completed/Running/Failed states, which is important since the default tests useDefaultConditions(which excludes Available). The test properly usesAllConditionsto enable all three conditions.
334-334: Minor: AllConditions passed but only availableCondition filtered in assertions.Line 334 passes
AllConditionstonewTestContext, enabling all three conditions (Available, Progressing, Degraded). However, lines 354-355 filter the status using onlyavailableCondition, which excludes Progressing and Degraded from the comparison.Is this intentional? If the test should validate only the Available condition behavior (ignoring Progressing/Degraded), the current code is correct. Otherwise, consider using
allConditionsfor filtering:- sanitizeInstanceStatus(actualStatus, availableCondition) - sanitizeInstanceStatus(&test.expectedObjects.operator.Status, availableCondition) + sanitizeInstanceStatus(actualStatus, allConditions) + sanitizeInstanceStatus(&test.expectedObjects.operator.Status, allConditions)Clarify the test intent in a comment if the current behavior is intentional.
497-551: LGTM! Dynamic condition configuration properly threaded through test context.The
newTestContextsignature now acceptsconditions []string(line 497) and passes it toNewJobController(line 540), enabling tests to exercise different condition sets. This aligns with the API changes in the production code.
760-780: Good enhancement: condition filtering prevents test pollution.The updated
sanitizeInstanceStatusnow acceptstestedConditionsand filtersstatus.Conditions(lines 772-779) to retain only the conditions under test. This prevents unrelated conditions from causing test failures and makes test expectations explicit.
176-195: Manual Degraded simulation—manual verification required
WithSyncDegradedOnErrornot found locally; confirm the test’s Degraded condition simulation matches library-go’s actual implementation.pkg/operator/externaletcdsupportcontroller/external_etcd_support_controller_test.go (6)
9-10: LGTM: correct lister importsUsing config and operator listers here is appropriate for the fakes/indexers.
23-26: LGTM: helper imports align with new test harnessConsolidating on etcdenvvar/ceohelpers/operatorclient/testutils keeps setup consistent.
41-51: Scenario coverage: non-external path looks goodHA topology with no CM creation is the right expectation.
71-76: Verify condition meaning for “external bootstrap completed”
EtcdRunningInClusterset to True reads as “internal etcd running”. Please confirm this is the intended flag for “external etcd bootstrap completed” driving CM creation.
122-122: Seed fake client with defaults (optional)If any controller path does live GETs, consider seeding
fakeKubeClientwithdefaultObjectsto keep lister and client views consistent.- fakeKubeClient := fake.NewSimpleClientset() + fakeKubeClient := fake.NewSimpleClientset(defaultObjects...)
134-135: LGTM: topology is injected via Infra.StatusDriving behavior via
ControlPlaneTopologymatches the new flow.pkg/tnf/operator/starter_test.go (1)
180-270: LGTM: Test helper properly refactored.The
getArgsfunction correctly initializes the updatedargsstruct with the newinfrastructureInformerandinitErrfields. The use of the test utility package (u.StaticPodOperatorStatus()) is consistent with the codebase's testing patterns.
| _, opStatus, _, err := operatorClient.GetStaticPodOperatorState() | ||
| if err != nil { | ||
| klog.Errorf("failed to get static pod operator state: %v", err) | ||
| return false, err | ||
| } | ||
|
|
||
| readyForEtcdTransition := v1helpers.IsOperatorConditionTrue(opStatus.Conditions, etcd.OperatorConditionExternalEtcdReadyForTransition) | ||
| if readyForEtcdTransition { | ||
| klog.V(4).Infof("ready for etcd transition") | ||
| } | ||
|
|
||
| return readyForEtcdTransition, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against nil operator status to avoid panic.
GetStaticPodOperatorState returns pointers; if opStatus is ever nil, dereferencing Conditions will panic. Add a nil check and return a descriptive error.
@@
func IsReadyForEtcdTransition(ctx context.Context, operatorClient v1helpers.StaticPodOperatorClient) (bool, error) {
- _, opStatus, _, err := operatorClient.GetStaticPodOperatorState()
+ _, opStatus, _, err := operatorClient.GetStaticPodOperatorState()
if err != nil {
klog.Errorf("failed to get static pod operator state: %v", err)
return false, err
}
+ if opStatus == nil {
+ return false, fmt.Errorf("operator status is nil")
+ }
@@
return readyForEtcdTransition, nil
}Also add the same guard in IsEtcdRunningInCluster, HasExternalEtcdCompletedTransition, and GetExternalEtcdClusterStatus after fetching opStatus. Include:
@@
-import (
+import (
"context"
+ "fmt"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _, opStatus, _, err := operatorClient.GetStaticPodOperatorState() | |
| if err != nil { | |
| klog.Errorf("failed to get static pod operator state: %v", err) | |
| return false, err | |
| } | |
| readyForEtcdTransition := v1helpers.IsOperatorConditionTrue(opStatus.Conditions, etcd.OperatorConditionExternalEtcdReadyForTransition) | |
| if readyForEtcdTransition { | |
| klog.V(4).Infof("ready for etcd transition") | |
| } | |
| return readyForEtcdTransition, nil | |
| } | |
| _, opStatus, _, err := operatorClient.GetStaticPodOperatorState() | |
| if err != nil { | |
| klog.Errorf("failed to get static pod operator state: %v", err) | |
| return false, err | |
| } | |
| if opStatus == nil { | |
| return false, fmt.Errorf("operator status is nil") | |
| } | |
| readyForEtcdTransition := v1helpers.IsOperatorConditionTrue(opStatus.Conditions, etcd.OperatorConditionExternalEtcdReadyForTransition) | |
| if readyForEtcdTransition { | |
| klog.V(4).Infof("ready for etcd transition") | |
| } | |
| return readyForEtcdTransition, nil | |
| } |
🤖 Prompt for AI Agents
In pkg/operator/ceohelpers/external_etcd_status.go around lines 41 to 53,
GetStaticPodOperatorState returns pointers so opStatus may be nil and
dereferencing opStatus.Conditions can panic; after calling
GetStaticPodOperatorState check if opStatus == nil and if so log/return a
descriptive error (e.g. wrap and return fmt.Errorf("operator status is nil"))
instead of dereferencing; apply the same nil-check-and-descriptive-error pattern
immediately after fetching opStatus in the functions IsEtcdRunningInCluster,
HasExternalEtcdCompletedTransition, and GetExternalEtcdClusterStatus to prevent
panics.
| hasExternalEtcdCompletedTransition, err := ceohelpers.HasExternalEtcdCompletedTransition(context.Background(), fakeOperatorClient) | ||
| if err != nil { | ||
| t.Errorf("failed to get external etcd status: %v", err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Use require.NoError instead of t.Errorf for test-stopping errors.
If HasExternalEtcdCompletedTransition fails, the test should stop immediately rather than continue with potentially invalid state.
Apply this diff:
hasExternalEtcdCompletedTransition, err := ceohelpers.HasExternalEtcdCompletedTransition(context.Background(), fakeOperatorClient)
- if err != nil {
- t.Errorf("failed to get external etcd status: %v", err)
- }
+ require.NoError(t, err, "failed to get external etcd status")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| hasExternalEtcdCompletedTransition, err := ceohelpers.HasExternalEtcdCompletedTransition(context.Background(), fakeOperatorClient) | |
| if err != nil { | |
| t.Errorf("failed to get external etcd status: %v", err) | |
| } | |
| hasExternalEtcdCompletedTransition, err := ceohelpers.HasExternalEtcdCompletedTransition(context.Background(), fakeOperatorClient) | |
| require.NoError(t, err, "failed to get external etcd status") |
🤖 Prompt for AI Agents
In pkg/tnf/operator/starter_test.go around lines 154 to 157, the test currently
calls t.Errorf when HasExternalEtcdCompletedTransition returns an error which
allows the test to continue; replace that error check with require.NoError(t,
err) (or require.NoErrorf with a message) so the test stops immediately on
failure, and add an import for "github.com/stretchr/testify/require" if it's not
already present.
|
/payload-job periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ovn-two-node-fencing-upgrade |
|
@jaypoulz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8f4fcad0-9ed1-11f0-801f-6852a30a14cb-0 |
7b77c22 to
dfb8d43
Compare
|
/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-upgrade |
|
@jaypoulz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7be5c1b0-a535-11f0-9990-bfd62c799dca-0 |
dfb8d43 to
d0a164c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
pkg/tnf/operator/starter.go (2)
11-12: Critical: Duplicate import causes compilation failure.The same package path is imported twice under different aliases (
configv1informersandv1). Go does not allow this and will fail to compile.Apply this diff to remove the duplicate:
import ( "bytes" "context" "fmt" "os" "sync" operatorv1 "github.com/openshift/api/operator/v1" configv1informers "github.com/openshift/client-go/config/informers/externalversions/config/v1" - v1 "github.com/openshift/client-go/config/informers/externalversions/config/v1" operatorv1informers "github.com/openshift/client-go/operator/informers/externalversions/operator/v1"Then update the function signature on line 59 to use the retained alias:
- networkInformer v1.NetworkInformer, + networkInformer configv1informers.NetworkInformer,And similarly on line 219:
- networkInformer v1.NetworkInformer, + networkInformer configv1informers.NetworkInformer,
179-183: Early return skips job setup on transient error.Returning here abandons Setup and Fencing job creation if the status read fails once. A transient API error would leave the cluster without TNF jobs.
Default to "not transitioned" on error and continue:
- hasExternalEtcdCompletedTransition, err := ceohelpers.HasExternalEtcdCompletedTransition(ctx, operatorClient) - if err != nil { - klog.Errorf("failed to get external etcd status: %v", err) - return - } + hasExternalEtcdCompletedTransition := false + if v, err := ceohelpers.HasExternalEtcdCompletedTransition(ctx, operatorClient); err != nil { + klog.Warningf("failed to get external etcd status; proceeding as not transitioned: %v", err) + } else { + hasExternalEtcdCompletedTransition = v + }
🧹 Nitpick comments (1)
pkg/tnf/pkg/jobs/jobcontroller_test.go (1)
185-195: Consider using controller's built-in degraded handling.The test manually simulates degraded condition application using
applyoperatorv1. This suggests the controller may not be configured withWithSyncDegradedOnError(). If the controller should automatically set Degraded on sync errors, consider adding that wrapper in the controller creation rather than simulating it in tests.If the JobController should automatically handle Degraded on errors, update its creation in
NewJobControllerto include:controller := factory.New(). WithSync(c.sync). WithSyncDegradedOnError(operatorClient). // Add this // ... other configurationThen remove the manual simulation from tests (lines 185-195).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (6)
pkg/operator/externaletcdsupportcontroller/external_etcd_support_controller_test.go(2 hunks)pkg/operator/targetconfigcontroller/targetconfigcontroller_test.go(0 hunks)pkg/tnf/operator/starter.go(3 hunks)pkg/tnf/operator/starter_test.go(2 hunks)pkg/tnf/pkg/jobs/jobcontroller.go(3 hunks)pkg/tnf/pkg/jobs/jobcontroller_test.go(15 hunks)
💤 Files with no reviewable changes (1)
- pkg/operator/targetconfigcontroller/targetconfigcontroller_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/tnf/operator/starter_test.go
🔇 Additional comments (5)
pkg/tnf/operator/starter.go (1)
185-191: Condition selection logic is correct.The logic correctly assigns
AllConditions(including Available) when external etcd has not completed transition, andDefaultConditions(excluding Available) after transition completes. This aligns with the PR objective that the Setup job sets Available only when ExternalEtcd hasn't transitioned yet.pkg/tnf/pkg/jobs/jobcontroller_test.go (4)
760-779: LGTM: Condition filtering preserves sort order.The updated
sanitizeInstanceStatuscorrectly filters conditions to only those being tested while preserving the sort order established earlier. The use ofslices.Containsis clean and efficient (assuming Go 1.21+ is available).
55-71: Good test coverage for no-sync scenario.The new test case "job not created without sync" properly validates that when
initialObjects.jobis nil andSync()is not called, no job is created and conditions remain unchanged. This ensures the controller doesn't spuriously create resources without being triggered.
258-362: Comprehensive Available condition lifecycle tests.
TestSyncWithAvailableConditionthoroughly exercises job lifecycle states (completed, running, failed) when Available is explicitly tracked. The test correctly usesAllConditionsfor setup andavailableConditionfor assertions, ensuring only the Available condition is validated.
5-5: Ensure CI environment uses Go ≥1.21
- go.mod target version is 1.24.0, so the
slicesimport is supported- CI configs don’t pin a Go version; verify your CI toolchain is ≥1.21
|
/unhold I'm just waiting for the previous PR to land in a nightly so that I can validate that this address upgrade failures with a payload job. I need the latest fix to be picked up by the version of the cluster that deploys initially, as well as the target of the upgrade, since the post-upgrade setup job will be checking for a condition that only exists with the previous patch. |
… setup job - Setup job now sets Available condition only when ExternalEtcd hasn't transitioned yet - Fixed jobcontroller & test setup to handle variable operator status conditions - Added test coverage for updated expectations for job controller condition transitions (Progressing, Degraded, Available) Depends on openshift#1484 Depends on openshift#1483
d0a164c to
65ec447
Compare
|
@jaypoulz: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/payload-job periodic-ci-openshift-release-master-nightly-4.21-e2e-metal-ovn-two-node-fencing-upgrade |
|
@jaypoulz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/731afb40-a855-11f0-9d32-a40970ee508a-0 |
|
/retest-required |
|
/verified by https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-cluster-etcd-operator-1481-nightly-4.21-e2e-metal-ovn-two-node-fencing-upgrade/1977780295356649472 (Notice the absence of a failure related to etcd becoming unavailable during the upgrade) |
|
@jaypoulz: This PR has been marked as verified by In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jaypoulz, tjungblu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required |
Depends on #1483, #1484