fix: potential race condition in `ProcessNewChainLock` #6924

UdjinM6 · 2025-10-27T10:17:35Z

Issue being fixed or feature implemented

ProcessNewChainLock can be called from 2 threads: sigshares (when clsig is created locally from recovered sigs) and msghand (when clsig is received from another peer). Suppose there is a node is at some height h and 2 blocks come very fast. The node would create clsig for the best block (h+2) but it could also receive clsig for the previous one (h+1) from other peers. We release cs temporary in ProcessNewChainLock so both clsig(h+1) and clsig(h+2) could enter ProcessNewChainLock and pass height check h < h+1 and h < h+2 until they are blocked by cs_main. If clsig(h+2) was the first in the queue it will move to the next part and assign bestChainLock = clsig(h+2) but once the lock is released clsig(h+1) will move forward and assign bestChainLock = clsig(h+1).

This is highly unlikely to happen on live networks but it does cause issues on regtest sometimes when we wait for a specific chainlocked block (h+2) but the peer won't relay it cause it's not the one the peer thinks is the best (h+1), all we get is notfound.

What was done?

Re-verify clsig height once we lock it again

How Has This Been Tested?

I had this issue in feature_governance_cl.py and now it's gone

Breaking Changes

n/a

Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have added or updated relevant unit/integration/functional/e2e tests
I have made corresponding changes to the documentation
I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

github-actions · 2025-10-27T10:18:12Z

✅ No Merge Conflicts Detected

This PR currently has no conflicts with other open PRs.

greptile-apps

Greptile Overview

Greptile Summary

This PR fixes a classic time-of-check-to-time-of-use (TOCTOU) race condition in the chainlock processing logic. The ProcessNewChainLock method temporarily releases its internal lock (cs) to perform expensive signature verification and block index lookups, creating a window where two threads—sigshares (local chainlock creation) and msghand (network-received chainlocks)—can race. When blocks arrive rapidly at heights h+1 and h+2, both chainlocks can pass the initial height validation check before either updates bestChainLock. If the newer chainlock (h+2) completes first but the older one (h+1) follows through, bestChainLock gets incorrectly overwritten with stale data. The fix adds a second height check after re-acquiring the lock (lines 158-161) to ensure atomicity, preventing the node from advertising an incorrect best chainlock to peers.

This change integrates with Dash's LLMQ subsystem (src/llmq/) which orchestrates distributed quorum-based services including ChainLocks—a 51% attack prevention mechanism that provides block finality. The chainlock state is critical for peer synchronization and must remain consistent across concurrent access paths.

Important Files Changed

Filename	Score	Overview
src/chainlock/chainlock.cpp	5/5	Added duplicate height validation after re-acquiring lock to prevent TOCTOU race where older chainlock overwrites newer one

Confidence score: 5/5

This PR is safe to merge with minimal risk
The fix follows a well-established double-checked locking pattern for TOCTOU mitigation; the change is minimal (4 lines), surgically targeted, and addresses a documented race condition without introducing new logic paths or external dependencies
No files require special attention—the change is self-contained and the developer confirmed resolution of regtest flakiness in feature_governance_cl.py

Sequence Diagram

sequenceDiagram
    participant User
    participant Thread1 as "sigshares Thread"
    participant Thread2 as "msghand Thread"
    participant ProcessNewChainLock
    participant cs as "Lock (cs)"
    participant cs_main as "Lock (cs_main)"
    participant VerifyChainLock
    participant Scheduler

    User->>Thread1: "Create local CLSIG from recovered sigs"
    User->>Thread2: "Receive CLSIG from peer"
    
    Thread1->>ProcessNewChainLock: "Call with clsig(h+2)"
    Thread2->>ProcessNewChainLock: "Call with clsig(h+1)"
    
    ProcessNewChainLock->>cs: "Lock cs (Thread1)"
    ProcessNewChainLock->>cs: "Check seenChainLocks"
    ProcessNewChainLock->>cs: "Check height h < h+2"
    ProcessNewChainLock->>cs: "Unlock cs"
    
    ProcessNewChainLock->>cs: "Lock cs (Thread2)"
    ProcessNewChainLock->>cs: "Check seenChainLocks"
    ProcessNewChainLock->>cs: "Check height h < h+1"
    ProcessNewChainLock->>cs: "Unlock cs"
    
    ProcessNewChainLock->>VerifyChainLock: "Verify clsig(h+2)"
    VerifyChainLock-->>ProcessNewChainLock: "Valid"
    
    ProcessNewChainLock->>VerifyChainLock: "Verify clsig(h+1)"
    VerifyChainLock-->>ProcessNewChainLock: "Valid"
    
    ProcessNewChainLock->>cs_main: "Lock cs_main (Thread1)"
    ProcessNewChainLock->>cs: "Lock cs (Thread1)"
    ProcessNewChainLock->>cs: "Re-verify height (NEW CHECK)"
    ProcessNewChainLock->>cs: "Set bestChainLock = clsig(h+2)"
    ProcessNewChainLock->>cs: "Unlock cs"
    ProcessNewChainLock->>cs_main: "Unlock cs_main"
    
    ProcessNewChainLock->>cs_main: "Lock cs_main (Thread2)"
    ProcessNewChainLock->>cs: "Lock cs (Thread2)"
    ProcessNewChainLock->>cs: "Re-verify height (NEW CHECK)"
    ProcessNewChainLock->>cs: "Height check fails: h+1 <= h+2"
    ProcessNewChainLock->>cs: "Return without updating"
    ProcessNewChainLock->>cs: "Unlock cs"
    ProcessNewChainLock->>cs_main: "Unlock cs_main"
    
    ProcessNewChainLock->>Scheduler: "Schedule EnforceBestChainLock"
    Scheduler->>User: "Enforce correct chain lock"

Context used:

Context from dashboard - CLAUDE.md (source)

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

coderabbitai · 2025-10-27T10:21:19Z

Walkthrough

This change modifies chainlock signature (CLSIG) processing in src/chainlock/chainlock.cpp to improve thread-safety. A secondary early-exit verification was added inside a locked region to detect when an in-flight CLSIG has been superseded by a newer bestChainLock, preventing unnecessary processing of outdated signatures. Additionally, the existing height comparison logic was updated to use <= instead of <, treating CLSIGs at the same height as the current chain lock as non-processable, with the comparison condition and comment updated accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant Thread A as Thread A<br/>(Processing CLSIG X)
    participant Lock
    participant bestChainLock
    participant Thread B as Thread B<br/>(New Block)
    
    rect rgb(200, 220, 255)
    Note over Thread A: Initial Check
    Thread A->>Lock: Acquire lock
    Thread A->>bestChainLock: Check if X is newer
    end
    
    par Thread B Processing
        Thread B->>bestChainLock: Update to newer chainlock
        Thread B->>Lock: Release lock
    and Thread A Processing
        Note over Thread A: Secondary Check (NEW)
        Thread A->>bestChainLock: Re-verify X not superseded
        alt X is now outdated
            Thread A->>Thread A: Early exit
        else X is still current
            Thread A->>Thread A: Continue processing
        end
        Thread A->>Lock: Release lock
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Thread-safety logic: Requires careful verification that the secondary check correctly handles the race condition between CLSIG processing and chainlock updates.
Comparison operator change: Verify that changing < to <= for height comparison does not introduce unintended side effects or regressions in chainlock validation logic.
Lock semantics: Confirm that the re-verification inside the locked region is placed at the appropriate point in the control flow to catch all potential superseding scenarios.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title "fix: potential race condition in `ProcessNewChainLock`" directly and clearly aligns with the main change in the changeset. According to the raw summary, the changes add a secondary early-exit check inside the locked region to re-verify that an in-flight CLSIG isn't superseded by a newer bestChainLock, and expand the early-exit condition to treat older or same height CLSIGs as non-processable. This precisely matches the title's description of fixing a race condition in ProcessNewChainLock. The title is concise, specific, and clearly communicates the primary change without unnecessary noise.
Description Check	✅ Passed	The PR description is well-detailed and directly related to the changeset. It clearly explains the race condition issue (ProcessNewChainLock called from two threads with temporary lock release allowing height check bypass), describes the problematic scenario (clsig(h+1) overwriting bestChainLock set to clsig(h+2)), states the fix (re-verify clsig height once we lock it again), and provides testing information (issue resolved in feature_governance_cl.py). The description meaningfully conveys what the PR addresses and the solution implemented.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b82450a and 696b926.

📒 Files selected for processing (1)

src/chainlock/chainlock.cpp (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

src/chainlock/chainlock.cpp

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: win64-build / Build source
GitHub Check: mac-build / Build source
GitHub Check: linux64_fuzz-build / Build source
GitHub Check: linux64_ubsan-build / Build source
GitHub Check: linux64_sqlite-build / Build source
GitHub Check: linux64-build / Build source
GitHub Check: linux64_tsan-build / Build source
GitHub Check: linux64_nowallet-build / Build source
GitHub Check: arm-linux-build / Build source
GitHub Check: Lint / Run linters

🔇 Additional comments (2)

src/chainlock/chainlock.cpp (2)

137-138: LGTM! Correctly treats same-height CLSIGs as non-processable.

The change from < to <= is correct. Once we have a chain lock at height N, we don't need to process additional CLSIGs at the same height. This avoids wasteful reprocessing and aligns with the secondary check added later.

157-161: Excellent fix for the race condition!

This secondary check correctly prevents the race condition where an older CLSIG could overwrite a newer one. The scenario:

Two threads (clsig h+1 and h+2) both pass the first check at line 137

Lock is released for expensive verification (line 143)

Thread with h+2 completes first and sets bestChainLock = h+2

This re-check prevents thread with h+1 from overwriting it

The double-check pattern after lock reacquisition is a standard technique for preventing TOCTOU issues in concurrent code.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

PastaPastaPasta

utACK 696b926

knst

utACK 696b926

fix: potential race condition in ProcessNewChainLock

696b926

UdjinM6 added this to the 23.1 milestone Oct 27, 2025

greptile-apps bot reviewed Oct 27, 2025

View reviewed changes

PastaPastaPasta approved these changes Oct 28, 2025

View reviewed changes

PastaPastaPasta requested a review from knst October 29, 2025 14:50

knst approved these changes Oct 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: potential race condition in `ProcessNewChainLock` #6924

fix: potential race condition in `ProcessNewChainLock` #6924

UdjinM6 commented Oct 27, 2025

Uh oh!

github-actions bot commented Oct 27, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

coderabbitai bot commented Oct 27, 2025

Uh oh!

PastaPastaPasta left a comment

Uh oh!

knst left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: potential race condition in ProcessNewChainLock #6924

Are you sure you want to change the base?

fix: potential race condition in ProcessNewChainLock #6924

Conversation

UdjinM6 commented Oct 27, 2025

Issue being fixed or feature implemented

What was done?

How Has This Been Tested?

Breaking Changes

Checklist:

Uh oh!

github-actions bot commented Oct 27, 2025

✅ No Merge Conflicts Detected

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Important Files Changed

Confidence score: 5/5

Sequence Diagram

Uh oh!

coderabbitai bot commented Oct 27, 2025

Walkthrough

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

PastaPastaPasta left a comment

Choose a reason for hiding this comment

Uh oh!

knst left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: potential race condition in `ProcessNewChainLock` #6924

fix: potential race condition in `ProcessNewChainLock` #6924