-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: potential race condition in ProcessNewChainLock
#6924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
✅ No Merge Conflicts DetectedThis PR currently has no conflicts with other open PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR fixes a classic time-of-check-to-time-of-use (TOCTOU) race condition in the chainlock processing logic. The ProcessNewChainLock method temporarily releases its internal lock (cs) to perform expensive signature verification and block index lookups, creating a window where two threads—sigshares (local chainlock creation) and msghand (network-received chainlocks)—can race. When blocks arrive rapidly at heights h+1 and h+2, both chainlocks can pass the initial height validation check before either updates bestChainLock. If the newer chainlock (h+2) completes first but the older one (h+1) follows through, bestChainLock gets incorrectly overwritten with stale data. The fix adds a second height check after re-acquiring the lock (lines 158-161) to ensure atomicity, preventing the node from advertising an incorrect best chainlock to peers.
This change integrates with Dash's LLMQ subsystem (src/llmq/) which orchestrates distributed quorum-based services including ChainLocks—a 51% attack prevention mechanism that provides block finality. The chainlock state is critical for peer synchronization and must remain consistent across concurrent access paths.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| src/chainlock/chainlock.cpp | 5/5 | Added duplicate height validation after re-acquiring lock to prevent TOCTOU race where older chainlock overwrites newer one |
Confidence score: 5/5
- This PR is safe to merge with minimal risk
- The fix follows a well-established double-checked locking pattern for TOCTOU mitigation; the change is minimal (4 lines), surgically targeted, and addresses a documented race condition without introducing new logic paths or external dependencies
- No files require special attention—the change is self-contained and the developer confirmed resolution of regtest flakiness in
feature_governance_cl.py
Sequence Diagram
sequenceDiagram
participant User
participant Thread1 as "sigshares Thread"
participant Thread2 as "msghand Thread"
participant ProcessNewChainLock
participant cs as "Lock (cs)"
participant cs_main as "Lock (cs_main)"
participant VerifyChainLock
participant Scheduler
User->>Thread1: "Create local CLSIG from recovered sigs"
User->>Thread2: "Receive CLSIG from peer"
Thread1->>ProcessNewChainLock: "Call with clsig(h+2)"
Thread2->>ProcessNewChainLock: "Call with clsig(h+1)"
ProcessNewChainLock->>cs: "Lock cs (Thread1)"
ProcessNewChainLock->>cs: "Check seenChainLocks"
ProcessNewChainLock->>cs: "Check height h < h+2"
ProcessNewChainLock->>cs: "Unlock cs"
ProcessNewChainLock->>cs: "Lock cs (Thread2)"
ProcessNewChainLock->>cs: "Check seenChainLocks"
ProcessNewChainLock->>cs: "Check height h < h+1"
ProcessNewChainLock->>cs: "Unlock cs"
ProcessNewChainLock->>VerifyChainLock: "Verify clsig(h+2)"
VerifyChainLock-->>ProcessNewChainLock: "Valid"
ProcessNewChainLock->>VerifyChainLock: "Verify clsig(h+1)"
VerifyChainLock-->>ProcessNewChainLock: "Valid"
ProcessNewChainLock->>cs_main: "Lock cs_main (Thread1)"
ProcessNewChainLock->>cs: "Lock cs (Thread1)"
ProcessNewChainLock->>cs: "Re-verify height (NEW CHECK)"
ProcessNewChainLock->>cs: "Set bestChainLock = clsig(h+2)"
ProcessNewChainLock->>cs: "Unlock cs"
ProcessNewChainLock->>cs_main: "Unlock cs_main"
ProcessNewChainLock->>cs_main: "Lock cs_main (Thread2)"
ProcessNewChainLock->>cs: "Lock cs (Thread2)"
ProcessNewChainLock->>cs: "Re-verify height (NEW CHECK)"
ProcessNewChainLock->>cs: "Height check fails: h+1 <= h+2"
ProcessNewChainLock->>cs: "Return without updating"
ProcessNewChainLock->>cs: "Unlock cs"
ProcessNewChainLock->>cs_main: "Unlock cs_main"
ProcessNewChainLock->>Scheduler: "Schedule EnforceBestChainLock"
Scheduler->>User: "Enforce correct chain lock"
Context used:
- Context from
dashboard- CLAUDE.md (source)
1 file reviewed, no comments
WalkthroughThis change modifies chainlock signature (CLSIG) processing in Sequence Diagram(s)sequenceDiagram
participant Thread A as Thread A<br/>(Processing CLSIG X)
participant Lock
participant bestChainLock
participant Thread B as Thread B<br/>(New Block)
rect rgb(200, 220, 255)
Note over Thread A: Initial Check
Thread A->>Lock: Acquire lock
Thread A->>bestChainLock: Check if X is newer
end
par Thread B Processing
Thread B->>bestChainLock: Update to newer chainlock
Thread B->>Lock: Release lock
and Thread A Processing
Note over Thread A: Secondary Check (NEW)
Thread A->>bestChainLock: Re-verify X not superseded
alt X is now outdated
Thread A->>Thread A: Early exit
else X is still current
Thread A->>Thread A: Continue processing
end
Thread A->>Lock: Release lock
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)src/**/*.{cpp,h,cc,cxx,hpp}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 696b926
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 696b926
Issue being fixed or feature implemented
ProcessNewChainLockcan be called from 2 threads:sigshares(whenclsigis created locally from recovered sigs) andmsghand(whenclsigis received from another peer). Suppose there is a node is at some heighthand 2 blocks come very fast. The node would createclsigfor the best block (h+2) but it could also receiveclsigfor the previous one (h+1) from other peers. We releasecstemporary inProcessNewChainLockso bothclsig(h+1)andclsig(h+2)could enterProcessNewChainLockand pass height checkh < h+1andh < h+2until they are blocked bycs_main. Ifclsig(h+2)was the first in the queue it will move to the next part and assignbestChainLock=clsig(h+2)but once the lock is releasedclsig(h+1)will move forward and assignbestChainLock=clsig(h+1).This is highly unlikely to happen on live networks but it does cause issues on regtest sometimes when we wait for a specific chainlocked block (
h+2) but the peer won't relay it cause it's not the one the peer thinks is the best (h+1), all we get isnotfound.What was done?
Re-verify clsig height once we lock it again
How Has This Been Tested?
I had this issue in
feature_governance_cl.pyand now it's goneBreaking Changes
n/a
Checklist: