-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(monitor): set monitors to nil after closing them #3388
Conversation
WalkthroughThe pull request introduces modifications to error handling and resource management across multiple files in the Longhorn controller. Changes include renaming the Changes
Assessment against linked issues
Possibly related PRs
Suggested reviewers
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (3)
👮 Files not reviewed due to content moderation or server errors (3)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
@mergify backport v1.8.x |
✅ Backports have been created
|
094f527
to
9649bed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (11)
controller/node_controller_test.go (11)
190-193
: Consider factoring out repeated condition checks.Lines 190-193 introduce repetitive checks for required packages, multipathd, kernel modules, and NFS client installation. This exact pattern also appears in other tests. Centralizing into a helper method or table-driven approach would reduce duplication and improve clarity.
278-281
: Repeated condition checks detected again.Similarly to lines 190-193, these lines reintroduce the same block of condition checks. Refactoring into a helper method would avoid repetition.
366-369
: Repeated condition checks detected again.Repeating these four lines across multiple test methods suggests consolidating them into a shared, reusable function or table-driven approach.
454-457
: Repeated condition checks detected again.This block mirrors earlier condition checks. A common function or data structure could streamline these tests.
577-580
: Repeated condition checks detected again.The same reasoning applies here; applying DRY (Don’t Repeat Yourself) principle would enhance maintainability.
730-733
: Repeated condition checks detected again.Refactor this recurring pattern into a helper method to prevent code bloat.
889-892
: Repeated condition checks detected again.Once more, consider consolidating these lines into a singular, reusable function or table-driven test to reduce duplication.
1019-1022
: Repeated condition checks detected again.These lines illustrate the same test logic used in the prior segments. A shared helper method can improve long-term maintainability.
1166-1169
: Repeated condition checks detected again.Refactoring these lines into a common helper would make future updates easier and help avoid potential inconsistencies across tests.
1283-1286
: Repeated condition checks detected again.As before, centralizing the condition checks will simplify maintenance and testing.
2123-2126
: Repeated condition checks detected again.It would be beneficial to unify this pattern with previous blocks. Consolidation ensures consistency across tests and reduces duplicate upkeep.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
controller/monitor/disk_monitor.go
(2 hunks)controller/monitor/environment_check_monitor.go
(1 hunks)controller/node_controller.go
(1 hunks)controller/node_controller_test.go
(11 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- controller/node_controller.go
- controller/monitor/environment_check_monitor.go
- controller/monitor/disk_monitor.go
9649bed
to
fb73905
Compare
Longhorn 10035 Signed-off-by: Derek Su <[email protected]>
No need to delay the environment check. Longhorn 10035 Signed-off-by: Derek Su <[email protected]>
fb73905
to
10cf797
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (7)
controller/monitor/fake_environment_check_monitor.go (2)
46-60
: Ensure clear logging upon normal PollUntilContextCancel termination
When the context finishes naturally (instead of failing with an error), consider emitting a final info log message. This helps distinguish normal shutdowns from unexpected terminations.func (m *FakeEnvironmentCheckMonitor) Start() { if err := wait.PollUntilContextCancel(m.ctx, m.syncPeriod, true, func(context.Context) (bool, error) { if err := m.run(struct{}{}); err != nil { m.logger.WithError(err).Error("Stopped monitoring environment check") } return false, nil }); err != nil { if errors.Is(err, context.Canceled) { m.logger.WithError(err).Warning("Environment check monitor is stopped") + m.logger.Info("Environment check monitor gracefully exited") } else { m.logger.WithError(err).Error("Failed to start environment check monitor") } } }
73-83
: Consider returning strongly typed data
Currently, GetCollectedData() returns an interface{} containing a slice of Conditions. For better clarity and type-safety, you might return a well-defined struct or slice type. This enhances readability and reduces type assertion steps for consumers of this method.controller/node_controller_test.go (1)
2340-2344
: Use consistent naming conventions
environmentCheckMonitor is established alongside diskMonitor in the same function. Ensure consistent naming conventions or structuring, such as grouping all monitors, for readability and maintainability.controller/monitor/fake_disk_monitor.go (1)
24-28
: Clarify usage of contexts for concurrency
Within NewFakeDiskMonitor, you’re creating a context but only returning the constructed monitor. If you plan to start a polling loop or background task later, consider making that explicit, or return an additional “Start” method. This avoids confusion about whether the monitor is active by default.controller/monitor/disk_monitor.go (3)
28-28
: Make DiskMonitorSyncPeriod configurable
Hardcoding 30-second intervals is convenient for now, but consider a configuration mechanism if production usage requires dynamic intervals or environment-based overrides.
72-76
: Defer the goroutine start until after construction
NewDiskMonitor launches m.Start() immediately via go routine. Instead, consider exposing separate “Start” and “Stop” methods so that users can manage when the monitor is launched and ensure all prerequisites are ready.
Line range hint
396-422
: Review block vs filesystem branching
Multiple places switch on diskType. Validate that new disk types or future expansions follow the same logic or error out properly. Consider a shared function that handles unknown disk types in a single location for DRY compliance.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
controller/monitor/disk_monitor.go
(12 hunks)controller/monitor/environment_check_monitor.go
(2 hunks)controller/monitor/fake_disk_monitor.go
(1 hunks)controller/monitor/fake_environment_check_monitor.go
(1 hunks)controller/node_controller.go
(1 hunks)controller/node_controller_test.go
(19 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- controller/monitor/environment_check_monitor.go
- controller/node_controller.go
🔇 Additional comments (21)
controller/monitor/fake_environment_check_monitor.go (1)
85-106
: Validate handle for missing or deleting node
The run() method relies on fetching the node object and logs an error if it cannot be retrieved. If the node is in the process of being deleted, consider handling that scenario explicitly. Otherwise, your logs might fill with repeated errors if the node stays unavailable.
controller/node_controller_test.go (17)
297-298
: Same comment as lines 211-212
383-384
: Same comment as lines 211-212
469-470
: Same comment as lines 211-212
623-624
: Same comment as lines 211-212
770-771
: Same comment as lines 211-212
927-928
: Same comment as lines 211-212
1049-1050
: Same comment as lines 211-212
1195-1196
: Same comment as lines 211-212
1289-1290
: Same comment as lines 211-212
1383-1384
: Same comment as lines 211-212
1472-1473
: Same comment as lines 211-212
1561-1562
: Same comment as lines 211-212
1650-1651
: Same comment as lines 211-212
1739-1740
: Same comment as lines 211-212
1834-1835
: Same comment as lines 211-212
211-212
: Verify environmentCheckMonitor is non-nil before invocation
Prior to calling environmentCheckMonitor.RunOnce(), consider verifying that the monitor is initialized. This avoids potential panics in edge cases if the controller’s monitor is unintentionally nil.
2334-2337
: Check concurrency assumptions for diskMonitor
diskMonitor is assigned here after instantiation. Validate that no goroutine starts the monitor prior to this assignment, preventing a potential race condition if diskMonitor’s methods are called before fully assigned.
✅ Verification successful
Let me analyze the code to check for potential race conditions by searching for any goroutines that might access the diskMonitor before its assignment.
Let me search for more context about how the controller and monitor are initialized and used.
Let me try one more search to understand the controller's initialization and monitor usage pattern.
Let me try one final search to understand the monitor initialization and usage pattern.
Based on the gathered information, I can now provide a final response about the concurrency safety of the diskMonitor initialization.
No race condition risk with diskMonitor initialization
The monitor is safely initialized before use. The code shows that:
- The monitor is initialized using
NewFakeDiskMonitor
and immediately assigned tonc.diskMonitor
- The monitor uses a baseMonitor struct with proper synchronization primitives (mutex)
- The only usage of diskMonitor in tests is through explicit
RunOnce()
calls, which happen after initialization - No evidence of background goroutines starting before the monitor assignment
🏁 Scripts executed
The following scripts were executed for the analysis:
Error: No shell scripts found
Script:
#!/bin/bash
# Search for the diskMonitor usage in the test file and related files
rg -A 5 "diskMonitor" controller/node_controller_test.go
# Search for goroutine starts or go keyword usage near diskMonitor
rg -B 5 -A 5 "go \w+.*diskMonitor" controller/node_controller_test.go
# Look for the NewFakeDiskMonitor implementation to understand its behavior
ast-grep --pattern 'func NewFakeDiskMonitor($$$) ($_, $_) {
$$$
}'
Length of output: 4863
Script:
#!/bin/bash
# Search for controller initialization and monitor usage
rg -B 10 -A 5 "nc.diskMonitor = diskMonitor" controller/node_controller_test.go
# Look for any Start or Run methods that might start goroutines
ast-grep --pattern 'func ($v *NodeController) Start($$$) {
$$$
}'
# Search for monitor start/run patterns
rg -B 5 -A 5 "Start\(\)|Run\(\)" controller/monitor/fake_disk_monitor.go
Length of output: 720
Script:
#!/bin/bash
# Search for the test setup and controller initialization
rg -B 20 -A 10 "NewFakeNodeController" controller/node_controller_test.go
# Look for any monitor interface or base implementation
ast-grep --pattern 'type $_ interface {
$$$
}'
# Search for RunOnce implementation
rg -B 5 -A 5 "func.*RunOnce" controller/monitor/fake_disk_monitor.go
Length of output: 65789
Script:
#!/bin/bash
# Search for monitor initialization and Start/Run methods
rg -B 5 -A 10 "type.*Monitor struct" controller/monitor/
# Look for any goroutine starts in the controller
rg -B 5 -A 5 "go \w+|Start\(\)" controller/node_controller_test.go
# Search for RunOnce usage context
rg -B 5 -A 5 "RunOnce\(\)" controller/node_controller_test.go
Length of output: 10995
controller/monitor/disk_monitor.go (3)
221-221
: Validate concurrency in collectDiskData
collectDiskData calls m.newDiskServiceClients() and closes them with defer, which is correct for concurrency. Just ensure that no other threads reference the clients after they’ve been closed, so there isn’t a race condition.
97-108
: Confirm partial failures logging
The Start method logs an error and continues looping. If partial failures occur repeatedly, consider capturing them differently or adding a backoff strategy, so logs aren’t flooded.
136-139
: Guard against node deletion
If the node is being deleted or is otherwise absent, the disk monitor logs an error. Repeatedly calling run() under those circumstances can lead to spammy logs. You may want to short-circuit gracefully when node deletion is detected.
Set monitors to nil after closing them, so the controller can recreate them when the node is back. Longhorn 10035 Signed-off-by: Derek Su <[email protected]>
10cf797
to
6ebe54b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@mergify backport v1.7.x v1.6.x |
✅ Backports have been created
|
Which issue(s) this PR fixes:
Issue longhorn/longhorn#10035
What this PR does / why we need it:
Special notes for your reviewer:
Additional documentation or context