Skip to content

Conversation

@UdjinM6
Copy link

@UdjinM6 UdjinM6 commented Oct 27, 2025

Issue being fixed or feature implemented

ProcessNewChainLock can be called from 2 threads: sigshares (when clsig is created locally from recovered sigs) and msghand (when clsig is received from another peer). Suppose there is a node is at some height h and 2 blocks come very fast. The node would create clsig for the best block (h+2) but it could also receive clsig for the previous one (h+1) from other peers. We release cs temporary in ProcessNewChainLock so both clsig(h+1) and clsig(h+2) could enter ProcessNewChainLock and pass height check h < h+1 and h < h+2 until they are blocked by cs_main. If clsig(h+2) was the first in the queue it will move to the next part and assign bestChainLock = clsig(h+2) but once the lock is released clsig(h+1) will move forward and assign bestChainLock = clsig(h+1).

This is highly unlikely to happen on live networks but it does cause issues on regtest sometimes when we wait for a specific chainlocked block (h+2) but the peer won't relay it cause it's not the one the peer thinks is the best (h+1), all we get is notfound.

What was done?

Re-verify clsig height once we lock it again

How Has This Been Tested?

I had this issue in feature_governance_cl.py and now it's gone

Breaking Changes

n/a

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

@UdjinM6 UdjinM6 added this to the 23.1 milestone Oct 27, 2025
@github-actions
Copy link

✅ No Merge Conflicts Detected

This PR currently has no conflicts with other open PRs.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR fixes a classic time-of-check-to-time-of-use (TOCTOU) race condition in the chainlock processing logic. The ProcessNewChainLock method temporarily releases its internal lock (cs) to perform expensive signature verification and block index lookups, creating a window where two threads—sigshares (local chainlock creation) and msghand (network-received chainlocks)—can race. When blocks arrive rapidly at heights h+1 and h+2, both chainlocks can pass the initial height validation check before either updates bestChainLock. If the newer chainlock (h+2) completes first but the older one (h+1) follows through, bestChainLock gets incorrectly overwritten with stale data. The fix adds a second height check after re-acquiring the lock (lines 158-161) to ensure atomicity, preventing the node from advertising an incorrect best chainlock to peers.

This change integrates with Dash's LLMQ subsystem (src/llmq/) which orchestrates distributed quorum-based services including ChainLocks—a 51% attack prevention mechanism that provides block finality. The chainlock state is critical for peer synchronization and must remain consistent across concurrent access paths.

Important Files Changed

Filename Score Overview
src/chainlock/chainlock.cpp 5/5 Added duplicate height validation after re-acquiring lock to prevent TOCTOU race where older chainlock overwrites newer one

Confidence score: 5/5

  • This PR is safe to merge with minimal risk
  • The fix follows a well-established double-checked locking pattern for TOCTOU mitigation; the change is minimal (4 lines), surgically targeted, and addresses a documented race condition without introducing new logic paths or external dependencies
  • No files require special attention—the change is self-contained and the developer confirmed resolution of regtest flakiness in feature_governance_cl.py

Sequence Diagram

sequenceDiagram
    participant User
    participant Thread1 as "sigshares Thread"
    participant Thread2 as "msghand Thread"
    participant ProcessNewChainLock
    participant cs as "Lock (cs)"
    participant cs_main as "Lock (cs_main)"
    participant VerifyChainLock
    participant Scheduler

    User->>Thread1: "Create local CLSIG from recovered sigs"
    User->>Thread2: "Receive CLSIG from peer"
    
    Thread1->>ProcessNewChainLock: "Call with clsig(h+2)"
    Thread2->>ProcessNewChainLock: "Call with clsig(h+1)"
    
    ProcessNewChainLock->>cs: "Lock cs (Thread1)"
    ProcessNewChainLock->>cs: "Check seenChainLocks"
    ProcessNewChainLock->>cs: "Check height h < h+2"
    ProcessNewChainLock->>cs: "Unlock cs"
    
    ProcessNewChainLock->>cs: "Lock cs (Thread2)"
    ProcessNewChainLock->>cs: "Check seenChainLocks"
    ProcessNewChainLock->>cs: "Check height h < h+1"
    ProcessNewChainLock->>cs: "Unlock cs"
    
    ProcessNewChainLock->>VerifyChainLock: "Verify clsig(h+2)"
    VerifyChainLock-->>ProcessNewChainLock: "Valid"
    
    ProcessNewChainLock->>VerifyChainLock: "Verify clsig(h+1)"
    VerifyChainLock-->>ProcessNewChainLock: "Valid"
    
    ProcessNewChainLock->>cs_main: "Lock cs_main (Thread1)"
    ProcessNewChainLock->>cs: "Lock cs (Thread1)"
    ProcessNewChainLock->>cs: "Re-verify height (NEW CHECK)"
    ProcessNewChainLock->>cs: "Set bestChainLock = clsig(h+2)"
    ProcessNewChainLock->>cs: "Unlock cs"
    ProcessNewChainLock->>cs_main: "Unlock cs_main"
    
    ProcessNewChainLock->>cs_main: "Lock cs_main (Thread2)"
    ProcessNewChainLock->>cs: "Lock cs (Thread2)"
    ProcessNewChainLock->>cs: "Re-verify height (NEW CHECK)"
    ProcessNewChainLock->>cs: "Height check fails: h+1 <= h+2"
    ProcessNewChainLock->>cs: "Return without updating"
    ProcessNewChainLock->>cs: "Unlock cs"
    ProcessNewChainLock->>cs_main: "Unlock cs_main"
    
    ProcessNewChainLock->>Scheduler: "Schedule EnforceBestChainLock"
    Scheduler->>User: "Enforce correct chain lock"
Loading

Context used:

  • Context from dashboard - CLAUDE.md (source)

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@coderabbitai
Copy link

coderabbitai bot commented Oct 27, 2025

Walkthrough

This change modifies chainlock signature (CLSIG) processing in src/chainlock/chainlock.cpp to improve thread-safety. A secondary early-exit verification was added inside a locked region to detect when an in-flight CLSIG has been superseded by a newer bestChainLock, preventing unnecessary processing of outdated signatures. Additionally, the existing height comparison logic was updated to use <= instead of <, treating CLSIGs at the same height as the current chain lock as non-processable, with the comparison condition and comment updated accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant Thread A as Thread A<br/>(Processing CLSIG X)
    participant Lock
    participant bestChainLock
    participant Thread B as Thread B<br/>(New Block)
    
    rect rgb(200, 220, 255)
    Note over Thread A: Initial Check
    Thread A->>Lock: Acquire lock
    Thread A->>bestChainLock: Check if X is newer
    end
    
    par Thread B Processing
        Thread B->>bestChainLock: Update to newer chainlock
        Thread B->>Lock: Release lock
    and Thread A Processing
        Note over Thread A: Secondary Check (NEW)
        Thread A->>bestChainLock: Re-verify X not superseded
        alt X is now outdated
            Thread A->>Thread A: Early exit
        else X is still current
            Thread A->>Thread A: Continue processing
        end
        Thread A->>Lock: Release lock
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Thread-safety logic: Requires careful verification that the secondary check correctly handles the race condition between CLSIG processing and chainlock updates.
  • Comparison operator change: Verify that changing < to <= for height comparison does not introduce unintended side effects or regressions in chainlock validation logic.
  • Lock semantics: Confirm that the re-verification inside the locked region is placed at the appropriate point in the control flow to catch all potential superseding scenarios.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "fix: potential race condition in ProcessNewChainLock" directly and clearly aligns with the main change in the changeset. According to the raw summary, the changes add a secondary early-exit check inside the locked region to re-verify that an in-flight CLSIG isn't superseded by a newer bestChainLock, and expand the early-exit condition to treat older or same height CLSIGs as non-processable. This precisely matches the title's description of fixing a race condition in ProcessNewChainLock. The title is concise, specific, and clearly communicates the primary change without unnecessary noise.
Description Check ✅ Passed The PR description is well-detailed and directly related to the changeset. It clearly explains the race condition issue (ProcessNewChainLock called from two threads with temporary lock release allowing height check bypass), describes the problematic scenario (clsig(h+1) overwriting bestChainLock set to clsig(h+2)), states the fix (re-verify clsig height once we lock it again), and provides testing information (issue resolved in feature_governance_cl.py). The description meaningfully conveys what the PR addresses and the solution implemented.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b82450a and 696b926.

📒 Files selected for processing (1)
  • src/chainlock/chainlock.cpp (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

  • src/chainlock/chainlock.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: win64-build / Build source
  • GitHub Check: mac-build / Build source
  • GitHub Check: linux64_fuzz-build / Build source
  • GitHub Check: linux64_ubsan-build / Build source
  • GitHub Check: linux64_sqlite-build / Build source
  • GitHub Check: linux64-build / Build source
  • GitHub Check: linux64_tsan-build / Build source
  • GitHub Check: linux64_nowallet-build / Build source
  • GitHub Check: arm-linux-build / Build source
  • GitHub Check: Lint / Run linters
🔇 Additional comments (2)
src/chainlock/chainlock.cpp (2)

137-138: LGTM! Correctly treats same-height CLSIGs as non-processable.

The change from < to <= is correct. Once we have a chain lock at height N, we don't need to process additional CLSIGs at the same height. This avoids wasteful reprocessing and aligns with the secondary check added later.


157-161: Excellent fix for the race condition!

This secondary check correctly prevents the race condition where an older CLSIG could overwrite a newer one. The scenario:

  1. Two threads (clsig h+1 and h+2) both pass the first check at line 137
  2. Lock is released for expensive verification (line 143)
  3. Thread with h+2 completes first and sets bestChainLock = h+2
  4. This re-check prevents thread with h+1 from overwriting it

The double-check pattern after lock reacquisition is a standard technique for preventing TOCTOU issues in concurrent code.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Member

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 696b926

@PastaPastaPasta PastaPastaPasta requested a review from knst October 29, 2025 14:50
Copy link
Collaborator

@knst knst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 696b926

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants