Skip to content

Conversation

@PastaPastaPasta
Copy link
Member

@PastaPastaPasta PastaPastaPasta commented Oct 21, 2025

Issue being fixed or feature implemented

m_nodes_mutex (mostly) just guards m_nodes; this is ~90% of the time only read from. I only see us mutating this data structure when we add or remove connections / peers. All fields inside of the CNode* inside m_nodes_mutex are thread safe. Typically atomics. As such, (it appears) we only really care about protecting the structure itself, and the lifetime of the pointers in it. Both these are still safe with a shared mutex model.

What was done?

Implemented usage of a shared mutex. We see lock contention drop significantly.

•	Total contentions: develop 11,551 → this PR 5,160 (↓ 55%)
•	Total contention time: develop 3,310.7 ms → this PR 2,067.4 ms (↓ 38%)

Contention on m_nodes_mutex went from

  • count 6091 -> 820 (-86.5%)
  • time waiting 1,698,355 -> 1,134,811 (-33%)

It is interesting that the average wait time for this mutex does increase, though I guess not super surprising. However, this is still a huge win.

How Has This Been Tested?

Tested on testnet 1 node.

Breaking Changes

None

Checklist:

Go over all the following points, and put an x in all the boxes that apply.

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

@github-actions
Copy link

github-actions bot commented Oct 21, 2025

⚠️ Potential Merge Conflicts Detected

This PR has potential conflicts with the following open PRs:

Please coordinate with the authors of these PRs to avoid merge conflicts.

@DashCoreAutoGuix
Copy link

Guix Automation has began to build this PR tagged as v23.0.0-devpr6912.37c097de. A new comment will be made when the image is pushed.

@DashCoreAutoGuix
Copy link

Guix Automation has completed; a release should be present here: https://github.com/dashpay/dash-dev-branches/releases/tag/v23.0.0-devpr6912.37c097de. The image should be on dockerhub soon.

@PastaPastaPasta PastaPastaPasta marked this pull request as ready for review October 22, 2025 03:11
@PastaPastaPasta
Copy link
Member Author

(marked ready for review for code rabbit)

@DashCoreAutoGuix
Copy link

Guix Automation has began to build this PR tagged as v23.0.0-devpr6912.63cf92d8. A new comment will be made when the image is pushed.

@coderabbitai
Copy link

coderabbitai bot commented Oct 22, 2025

Walkthrough

Refactors network code for const-correctness and a shared-lock model. m_nodes_mutex changed from RecursiveMutex to SharedMutex; several mutexes made mutable. Added AssertSharedLockHeld and shared-lock assertions. Introduced const-safe node lookup helpers (FindNodeLocked variants, ExistsNodeShared, WithNodeShared, WithNodeExclusive), new ForNode const overloads, and CNode::CopyStats made const. Eviction logic centralized via GetEvictionCandidates and AttemptToEvictConnection. Threaded connection paths and masternode connection code updated to use the new lock-aware APIs. Enforces a hard cap of three high-bandwidth peers announcing headers.

Sequence Diagram(s)

mermaid
sequenceDiagram
autonumber
participant Caller
participant CConnman
participant NodeStore as "m_nodes (SharedMutex)"
participant CNode
Note over CConnman,NodeStore: New const-safe lookup helpers used

Caller->>CConnman: WithNodeShared(key, cond, func)
activate CConnman
CConnman->>NodeStore: Acquire shared lock
activate NodeStore
NodeStore-->>CConnman: FindNodeLocked(key) (const)
CConnman->>CNode: cond(const CNode*)
alt cond true
CConnman->>CNode: func(const CNode*)
end
NodeStore-->>CConnman: Release shared lock
deactivate NodeStore
CConnman-->>Caller: result
deactivate CConnman

mermaid
sequenceDiagram
autonumber
participant EvictCaller
participant CConnman
participant NodeStore as "m_nodes (SharedMutex)"
participant EvictionList as "GetEvictionCandidates()"

EvictCaller->>CConnman: AttemptToEvictConnection()
activate CConnman
CConnman->>NodeStore: Acquire shared lock
activate NodeStore
CConnman->>EvictionList: GetEvictionCandidates() const
EvictionList-->>CConnman: candidates (const)
CConnman->>CConnman: WithNodeMutable(candidate.id, decide eviction)
alt eviction chosen
CConnman->>NodeStore: Acquire exclusive lock (mutable ops)
NodeStore-->>CConnman: DisconnectNode(...)
end
NodeStore-->>CConnman: Release locks
deactivate NodeStore
CConnman-->>EvictCaller: done
deactivate CConnman

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Potential areas needing extra attention:

  • Correctness of shared vs. exclusive lock annotations and call sites.
  • Template helpers (WithNodeShared/WithNodeExclusive/FindNodeLockedBy) and instantiations.
  • Eviction candidate selection and AttemptToEvictConnection mutation path.
  • Threaded connection flows (ThreadOpenConnections / ThreadOpenMasternodeConnections) and masternode-specific lookup changes.
  • New AssertSharedLockHeld implementation and its debug/non-debug behavior.

Suggested reviewers

  • knst
  • UdjinM6
  • kwvg

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'perf: convert m_nodes_mutex from recursive mutex to shared mutex' directly and specifically describes the main change in the PR, which is converting a mutex type for performance optimization.
Description check ✅ Passed The description is directly related to the changeset, explaining the rationale for converting m_nodes_mutex to a shared mutex, providing performance metrics, and documenting testing and breaking changes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
src/net.cpp (1)

3464-3469: Misleading comment about lock access level.

The comment states "Use shared access since we already hold m_nodes_mutex", but at this point the code holds an exclusive lock (via LOCK2 at line 3526, called before getPendingQuorumNodes at line 3547). While the code is correct—FindNodeLocked (const version) can be called with either shared or exclusive lock—the comment is confusing.

Clarify or remove the comment:

-                    // Use shared access since we already hold m_nodes_mutex
                     bool slowHandshake = false;
                     if (const CNode* pnode = FindNodeLocked(addr2)) {
src/net.h (1)

1658-1662: optional<invoke_result_t<...>> can fail for reference results; decay the type

If a callable returns a reference (or cv/ref-qualified type), std::optional<invoke_result_t<...>> is ill-formed. Decay the result type before wrapping it in optional.

Apply this diff to both helpers:

 template<typename Key, typename Callable>
-std::optional<std::invoke_result_t<Callable, const CNode*>> WithNodeShared(const Key& key, Callable&& fn) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex)
+std::optional<std::decay_t<std::invoke_result_t<Callable, const CNode*>>> WithNodeShared(const Key& key, Callable&& fn) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex)
 {
     READ_LOCK(m_nodes_mutex);
-    if (const CNode* p = FindNodeLockedBy(key)) return std::optional<std::invoke_result_t<Callable, const CNode*>>{fn(p)};
+    if (const CNode* p = FindNodeLockedBy(key)) return std::optional<std::decay_t<std::invoke_result_t<Callable, const CNode*>>>{fn(p)};
     return std::nullopt;
 }
 
 template<typename Key, typename Callable>
-std::optional<std::invoke_result_t<Callable, CNode*>> WithNodeExclusive(const Key& key, Callable&& fn) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex)
+std::optional<std::decay_t<std::invoke_result_t<Callable, CNode*>>> WithNodeExclusive(const Key& key, Callable&& fn) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex)
 {
     LOCK(m_nodes_mutex);
-    if (const CNode* cp = FindNodeLockedBy(key)) return std::optional<std::invoke_result_t<Callable, CNode*>>{fn(const_cast<CNode*>(cp))};
+    if (const CNode* cp = FindNodeLockedBy(key)) return std::optional<std::decay_t<std::invoke_result_t<Callable, CNode*>>>{fn(const_cast<CNode*>(cp))};
     return std::nullopt;
 }

No additional includes needed; <type_traits> is already added.

Also applies to: 1663-1668, 1669-1686, 1687-1695, 1697-1705, 1708-1713, 1715-1721

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e5ce1f0 and 63cf92d.

📒 Files selected for processing (4)
  • src/net.cpp (36 hunks)
  • src/net.h (26 hunks)
  • src/net_processing.cpp (1 hunks)
  • src/sync.h (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

  • src/sync.h
  • src/net_processing.cpp
  • src/net.cpp
  • src/net.h
🧠 Learnings (6)
📚 Learning: 2025-09-09T21:36:58.969Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.969Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket operations are properly synchronized using the cs_net mutex. The m_sock and m_server members are GUARDED_BY(cs_net), and methods like Connect(), SendDirectly(), and ReconnectThread() use appropriate locking with cs_net to prevent race conditions on socket access.

Applied to files:

  • src/net.h
📚 Learning: 2025-09-09T21:36:11.833Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks. The implementation correctly uses cs_net with GUARDED_BY annotations and EXCLUSIVE_LOCKS_REQUIRED to synchronize socket access between SendDirectly() and ReconnectThread().

Applied to files:

  • src/net.h
📚 Learning: 2025-09-09T21:36:31.504Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.504Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access is protected by the cs_net mutex. This mutex coordinates between the TCP send path in SendDirectly() and the reconnection operations in Connect()/Reconnect() methods, ensuring proper synchronization of socket state.

Applied to files:

  • src/net.h
📚 Learning: 2025-09-09T21:36:31.504Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.504Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access requires synchronization using the cs_net mutex. The race condition exists between SendDirectly() (callable from any thread) and Connect() (called by ReconnectThread), where both read and write m_sock concurrently. The cs_net mutex properly coordinates these network operations.

Applied to files:

  • src/net.h
📚 Learning: 2025-09-09T21:36:11.833Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks.

Applied to files:

  • src/net.h
📚 Learning: 2025-09-09T21:36:58.969Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.969Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket access operations (m_sock and m_server) should be protected by the cs_net mutex, not avoiding synchronization entirely. While lock overhead concerns are valid in general, socket operations require proper synchronization via cs_net.

Applied to files:

  • src/net.h
🧬 Code graph analysis (3)
src/sync.h (1)
src/sync.cpp (5)
  • AssertLockHeldInternal (280-285)
  • AssertLockHeldInternal (280-280)
  • AssertLockHeldInternal (286-286)
  • AssertLockHeldInternal (287-287)
  • AssertLockHeldInternal (288-288)
src/net.cpp (2)
src/node/interfaces.cpp (12)
  • ip (496-503)
  • ip (496-496)
  • id (511-517)
  • id (511-511)
  • stats (453-479)
  • stats (453-453)
  • LOCK (541-545)
  • LOCK (551-558)
  • LOCK (559-566)
  • LOCK (828-837)
  • LOCK (871-875)
  • LOCK (1055-1059)
src/net.h (3)
  • m_nodes_mutex (1688-1693)
  • CopyStats (1032-1040)
  • string (107-282)
src/net.h (1)
src/net.cpp (54)
  • CopyStats (674-724)
  • CopyStats (674-674)
  • StopNodes (4177-4228)
  • StopNodes (4177-4177)
  • use_v2transport (2025-2025)
  • CheckIncomingNonce (425-433)
  • CheckIncomingNonce (425-425)
  • ForNode (4822-4833)
  • ForNode (4822-4822)
  • ForNode (4835-4846)
  • ForNode (4835-4835)
  • ForNode (4848-4859)
  • ForNode (4848-4848)
  • ForNode (4861-4872)
  • ForNode (4861-4861)
  • CNode (4693-4731)
  • addr (2941-2941)
  • addr (3393-3393)
  • AddNode (4298-4310)
  • AddNode (4298-4298)
  • GetAddedNodeInfo (3317-3375)
  • GetAddedNodeInfo (3317-3317)
  • GetNodeStats (4519-4532)
  • GetNodeStats (4519-4519)
  • FindNodeLocked (342-353)
  • FindNodeLocked (342-342)
  • FindNodeLocked (355-366)
  • FindNodeLocked (355-355)
  • FindNodeLocked (368-379)
  • FindNodeLocked (368-368)
  • FindNodeLocked (381-392)
  • FindNodeLocked (381-381)
  • FindNodeLockedMutable (394-398)
  • FindNodeLockedMutable (394-394)
  • FindNodeLockedMutable (400-404)
  • FindNodeLockedMutable (400-400)
  • FindNodeLockedMutable (406-410)
  • FindNodeLockedMutable (406-406)
  • FindNodeLockedMutable (412-416)
  • FindNodeLockedMutable (412-412)
  • AlreadyConnectedToAddress (419-423)
  • AlreadyConnectedToAddress (419-419)
  • GetEvictionCandidates (1820-1868)
  • GetEvictionCandidates (1820-1820)
  • AttemptToEvictConnection (1878-1890)
  • AttemptToEvictConnection (1878-1878)
  • ConnectNode (449-595)
  • ConnectNode (449-449)
  • SocketRecvData (2601-2656)
  • SocketRecvData (2601-2601)
  • GetCurrentBlockRelayOnlyConns (3304-3315)
  • GetCurrentBlockRelayOnlyConns (3304-3304)
  • MaybePickPreferredNetwork (2910-2924)
  • MaybePickPreferredNetwork (2910-2910)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: arm-linux-build / Build source
  • GitHub Check: mac-build / Build source
  • GitHub Check: linux64_tsan-build / Build source
  • GitHub Check: linux64_sqlite-build / Build source
  • GitHub Check: linux64_fuzz-build / Build source
  • GitHub Check: linux64_ubsan-build / Build source
  • GitHub Check: linux64-build / Build source
  • GitHub Check: win64-build / Build source
  • GitHub Check: Lint / Run linters
🔇 Additional comments (13)
src/net_processing.cpp (1)

1308-1318: Good call keeping SENDCMPCT fan-out bounded.

Capping lNodesAnnouncingHeaderAndIDs at three entries—and explicitly telling the oldest peer to drop out of high-bandwidth mode—brings us back in line with the BIP152 limit without upsetting the existing outbound-protection logic above. Looks solid to me.

src/sync.h (1)

164-167: LGTM: Shared lock assertion helper implemented correctly.

The new AssertSharedLockHeldInline function and AssertSharedLockHeld macro provide the necessary infrastructure for asserting shared lock ownership on SharedMutex. The NO_THREAD_SAFETY_ANALYSIS annotation is correctly applied since the underlying AssertLockHeldInternal requires exclusive lock annotation but we're checking shared lock at runtime.

src/net.cpp (6)

341-392: LGTM: Const-correct FindNodeLocked methods.

The const-qualified FindNodeLocked overloads correctly use const CNode* and enable read-only node lookups under either shared or exclusive locks. The new NodeId overload at lines 381-392 is a useful addition.


419-433: LGTM: Proper const-correctness and read-lock usage.

Both AlreadyConnectedToAddress and CheckIncomingNonce correctly use READ_LOCK for read-only node lookups and are appropriately const-qualified.


1820-1868: LGTM: GetEvictionCandidates properly encapsulated.

The method correctly uses READ_LOCK for read-only node access and returns a value (not references), eliminating lifetime concerns. Good encapsulation of the eviction candidate collection logic.


4835-4859: LGTM: Const-correct ForNode overloads.

The new const ForNode overloads for CService and NodeId correctly use READ_LOCK and accept const functors, enabling read-only node operations without exclusive lock contention. The pattern complements the existing mutable overloads well.


674-674: LGTM: CopyStats correctly made const.

Making CopyStats const is appropriate since it only reads node state without modification, enabling use in read-only contexts.


421-421: Good read/write lock separation throughout.

The systematic conversion to READ_LOCK for read-only node access while retaining exclusive locks for modifications is well-executed. This enables concurrent reads and reduces contention, which is the primary benefit of the RecursiveMutex to SharedMutex conversion.

src/net.h (5)

51-51: Includes and forward-declare: LGTM

type_traits is needed for invoke_result_t below; forward-declaring NodeEvictionCandidate here is appropriate for the declarations that return vectors.

Also applies to: 65-65


735-735: Const-correctness via mutable locks and CopyStats: LGTM

Making cs_vSend, cs_vRecv, and m_subver_mutex mutable enables const methods like CopyStats() to lock internals safely. CopyStats() const with negative lock requirements matches the net.cpp implementation.

Also applies to: 737-737, 761-761, 1032-1032


1301-1304: SharedMutex migration and API annotations: LGTM

ForNode/ForEach*/ForEach*Then now explicitly require the nodes lock not to be held and take shared/exclusive locks internally. Const variants use READ_LOCK; non-const variants use LOCK. This is the right direction for removing recursive locking.

Also applies to: 1307-1317, 1333-1342, 1350-1359, 1366-1375, 1381-1395, 1397-1406, 1413-1423, 1425-1429


1811-1811: m_nodes_mutex as SharedMutex and SHARED_LOCKS_REQUIRED usage: LGTM

Declaring m_nodes_mutex as SharedMutex and introducing SHARED_LOCKS_REQUIRED for read-only paths (e.g., MultipleManualOrFullOutboundConns) aligns with the non-recursive, shared/exclusive model.

Also applies to: 1543-1543


1729-1732: Public APIs with explicit node-lock contracts: LGTM

GetEvictionCandidates()/AttemptToEvictConnection(), GetCurrentBlockRelayOnlyConns(), MaybePickPreferredNetwork(), GetExtra* counts, GetNodeStats(), and DisconnectNode* are all updated to declare !m_nodes_mutex. Matches the corresponding READ/LOCK usage in net.cpp.

Also applies to: 1758-1758, 1770-1770, 1462-1464, 1504-1508

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63cf92d and 7121c3c.

📒 Files selected for processing (2)
  • src/net.cpp (36 hunks)
  • src/net.h (26 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

  • src/net.h
  • src/net.cpp
🧠 Learnings (6)
📚 Learning: 2025-09-09T21:36:11.833Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks. The implementation correctly uses cs_net with GUARDED_BY annotations and EXCLUSIVE_LOCKS_REQUIRED to synchronize socket access between SendDirectly() and ReconnectThread().

Applied to files:

  • src/net.h
📚 Learning: 2025-09-09T21:36:58.969Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.969Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket operations are properly synchronized using the cs_net mutex. The m_sock and m_server members are GUARDED_BY(cs_net), and methods like Connect(), SendDirectly(), and ReconnectThread() use appropriate locking with cs_net to prevent race conditions on socket access.

Applied to files:

  • src/net.h
📚 Learning: 2025-09-09T21:36:31.504Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.504Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access is protected by the cs_net mutex. This mutex coordinates between the TCP send path in SendDirectly() and the reconnection operations in Connect()/Reconnect() methods, ensuring proper synchronization of socket state.

Applied to files:

  • src/net.h
📚 Learning: 2025-09-09T21:36:31.504Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.504Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access requires synchronization using the cs_net mutex. The race condition exists between SendDirectly() (callable from any thread) and Connect() (called by ReconnectThread), where both read and write m_sock concurrently. The cs_net mutex properly coordinates these network operations.

Applied to files:

  • src/net.h
📚 Learning: 2025-09-09T21:36:11.833Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks.

Applied to files:

  • src/net.h
📚 Learning: 2025-09-09T21:36:58.969Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.969Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket access operations (m_sock and m_server) should be protected by the cs_net mutex, not avoiding synchronization entirely. While lock overhead concerns are valid in general, socket operations require proper synchronization via cs_net.

Applied to files:

  • src/net.h
🧬 Code graph analysis (2)
src/net.h (1)
src/net.cpp (62)
  • CopyStats (674-724)
  • CopyStats (674-674)
  • StopNodes (4177-4228)
  • StopNodes (4177-4177)
  • use_v2transport (2025-2025)
  • ForNode (4822-4833)
  • ForNode (4822-4822)
  • ForNode (4835-4846)
  • ForNode (4835-4835)
  • ForNode (4848-4859)
  • ForNode (4848-4848)
  • ForNode (4861-4872)
  • ForNode (4861-4861)
  • CNode (4693-4731)
  • addr (2941-2941)
  • addr (3393-3393)
  • GetExtraFullOutboundCount (2859-2875)
  • GetExtraFullOutboundCount (2859-2859)
  • GetExtraBlockRelayCount (2877-2889)
  • GetExtraBlockRelayCount (2877-2877)
  • GetNodeStats (4519-4532)
  • GetNodeStats (4519-4519)
  • DisconnectNode (4534-4541)
  • DisconnectNode (4534-4534)
  • DisconnectNode (4543-4555)
  • DisconnectNode (4543-4543)
  • DisconnectNode (4557-4560)
  • DisconnectNode (4557-4557)
  • DisconnectNode (4562-4573)
  • DisconnectNode (4562-4562)
  • MultipleManualOrFullOutboundConns (2904-2908)
  • MultipleManualOrFullOutboundConns (2904-2904)
  • FindNodeLocked (342-353)
  • FindNodeLocked (342-342)
  • FindNodeLocked (355-366)
  • FindNodeLocked (355-355)
  • FindNodeLocked (368-379)
  • FindNodeLocked (368-368)
  • FindNodeLocked (381-392)
  • FindNodeLocked (381-381)
  • FindNodeLockedMutable (394-398)
  • FindNodeLockedMutable (394-394)
  • FindNodeLockedMutable (400-404)
  • FindNodeLockedMutable (400-400)
  • FindNodeLockedMutable (406-410)
  • FindNodeLockedMutable (406-406)
  • FindNodeLockedMutable (412-416)
  • FindNodeLockedMutable (412-412)
  • GetEvictionCandidates (1820-1868)
  • GetEvictionCandidates (1820-1820)
  • AttemptToEvictConnection (1878-1890)
  • AttemptToEvictConnection (1878-1878)
  • ConnectNode (449-595)
  • ConnectNode (449-449)
  • SocketRecvData (2601-2656)
  • SocketRecvData (2601-2601)
  • GetCurrentBlockRelayOnlyConns (3304-3315)
  • GetCurrentBlockRelayOnlyConns (3304-3304)
  • MaybePickPreferredNetwork (2910-2924)
  • MaybePickPreferredNetwork (2910-2910)
  • PerformReconnections (4917-4944)
  • PerformReconnections (4917-4917)
src/net.cpp (2)
src/node/interfaces.cpp (12)
  • ip (496-503)
  • ip (496-496)
  • id (511-517)
  • id (511-511)
  • stats (453-479)
  • stats (453-453)
  • LOCK (541-545)
  • LOCK (551-558)
  • LOCK (559-566)
  • LOCK (828-837)
  • LOCK (871-875)
  • LOCK (1055-1059)
src/net.h (4)
  • m_nodes_mutex (1688-1693)
  • CopyStats (1032-1040)
  • AllNodes (1299-1299)
  • string (107-282)

src/net.h Outdated
Comment on lines 1658 to 1721
const CNode* FindNodeLocked(const CNetAddr& ip, bool fExcludeDisconnecting = true) const SHARED_LOCKS_REQUIRED(m_nodes_mutex);
const CNode* FindNodeLocked(const std::string& addrName, bool fExcludeDisconnecting = true) const SHARED_LOCKS_REQUIRED(m_nodes_mutex);
const CNode* FindNodeLocked(const CService& addr, bool fExcludeDisconnecting = true) const SHARED_LOCKS_REQUIRED(m_nodes_mutex);
const CNode* FindNodeLocked(NodeId id, bool fExcludeDisconnecting = true) const SHARED_LOCKS_REQUIRED(m_nodes_mutex);

// Mutable find helpers for callers that already hold the exclusive lock
CNode* FindNodeLockedMutable(const CNetAddr& ip, bool fExcludeDisconnecting = true) EXCLUSIVE_LOCKS_REQUIRED(m_nodes_mutex);
CNode* FindNodeLockedMutable(const std::string& addrName, bool fExcludeDisconnecting = true) EXCLUSIVE_LOCKS_REQUIRED(m_nodes_mutex);
CNode* FindNodeLockedMutable(const CService& addr, bool fExcludeDisconnecting = true) EXCLUSIVE_LOCKS_REQUIRED(m_nodes_mutex);
CNode* FindNodeLockedMutable(NodeId id, bool fExcludeDisconnecting = true) EXCLUSIVE_LOCKS_REQUIRED(m_nodes_mutex);

// Type-agnostic node matching helpers
static inline bool NodeMatches(const CNode* p, const CService& addr)
{
return static_cast<CService>(p->addr) == addr;
}
static inline bool NodeMatches(const CNode* p, const CNetAddr& ip)
{
return static_cast<CNetAddr>(p->addr) == ip;
}
static inline bool NodeMatches(const CNode* p, const std::string& addrName)
{
return p->m_addr_name == addrName;
}
static inline bool NodeMatches(const CNode* p, const NodeId id)
{
return p->GetId() == id;
}

template<typename Key>
const CNode* FindNodeLockedBy(const Key& key, bool fExcludeDisconnecting = true) const SHARED_LOCKS_REQUIRED(m_nodes_mutex)
{
for (const CNode* pnode : m_nodes) {
if (fExcludeDisconnecting && pnode->fDisconnect) continue;
if (NodeMatches(pnode, key)) return pnode;
}
return nullptr;
}

// Callback helpers with explicit lock semantics (templated on key type)
// Lambda-based shared accessor returning optional result (nullopt = not found)
template<typename Key, typename Callable>
std::optional<std::invoke_result_t<Callable, const CNode*>> WithNodeShared(const Key& key, Callable&& fn) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex)
{
READ_LOCK(m_nodes_mutex);
if (const CNode* p = FindNodeLockedBy(key)) return std::optional<std::invoke_result_t<Callable, const CNode*>>{fn(p)};
return std::nullopt;
}

// Fast existence check under shared lock
template<typename Key>
bool ExistsNodeShared(const Key& key) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex)
{
READ_LOCK(m_nodes_mutex);
return FindNodeLockedBy(key) != nullptr;
}

template<typename Key, typename Callable>
std::optional<std::invoke_result_t<Callable, CNode*>> WithNodeExclusive(const Key& key, Callable&& fn) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex)
{
LOCK(m_nodes_mutex);
if (const CNode* cp = FindNodeLockedBy(key)) return std::optional<std::invoke_result_t<Callable, CNode*>>{fn(const_cast<CNode*>(cp))};
return std::nullopt;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep find helpers visible to disconnect-node callers

With the new helpers we now default fExcludeDisconnecting = true (and WithNodeExclusive forwards that default). As soon as a peer has been marked fDisconnect = true, FindNodeLockedBy(...) stops finding it, so DisconnectNode(...) returns false while the socket is still being torn down. Previously the RPC would keep reporting success until the peer was actually removed. This makes the disconnectnode RPC racy/regressive. Please include disconnecting peers when servicing these helpers (e.g. restore the old default or explicitly pass false from WithNodeExclusive) so callers can still find the node until it is gone.

🤖 Prompt for AI Agents
In src/net.h around lines 1658-1721, the new find helpers default
fExcludeDisconnecting = true which causes disconnecting peers to be ignored and
breaks disconnectnode; restore behavior to include disconnecting peers by
changing the default to fExcludeDisconnecting = false in the FindNodeLockedBy
template signature (and any other FindNodeLocked* helpers if applicable), or
alternately ensure WithNodeShared, ExistsNodeShared and WithNodeExclusive call
FindNodeLockedBy(key, /*fExcludeDisconnecting=*/false) so disconnecting peers
are still found until removed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/net.cpp (2)

494-500: Consider using WithNodeShared here for consistency and better concurrency.

This check for an existing connection uses WithNodeExclusive, but the lambda doesn't modify the node. For consistency with the similar check at line 460 (which uses WithNodeShared) and to allow better concurrency, consider changing to:

-if (WithNodeExclusive(static_cast<CService>(addrConnect), [&](CNode* /*pnode*/){ return true; })) {
+if (WithNodeShared(static_cast<CService>(addrConnect), [](const CNode*){ return true; })) {

Note: Also removes unnecessary [&] capture.


3464-3464: Clarify comment to avoid confusion with shared vs. exclusive locks.

The comment "Use shared access since we already hold m_nodes_mutex" could be misinterpreted in the context of this PR's shared/exclusive locking model. Consider clarifying:

-// Use shared access since we already hold m_nodes_mutex
+// Use const access (FindNodeLocked) since we already hold m_nodes_mutex

This makes it clear that "shared access" refers to const/read-only access, not a shared lock.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7121c3c and 420f614.

📒 Files selected for processing (1)
  • src/net.cpp (35 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

  • src/net.cpp
🧬 Code graph analysis (1)
src/net.cpp (1)
src/net.h (4)
  • m_nodes_mutex (1688-1693)
  • CopyStats (1032-1040)
  • AllNodes (1299-1299)
  • string (107-282)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: linux64_nowallet-build / Build source
  • GitHub Check: linux64_ubsan-build / Build source
  • GitHub Check: linux64_sqlite-build / Build source
  • GitHub Check: linux64_fuzz-build / Build source
  • GitHub Check: linux64-build / Build source
  • GitHub Check: arm-linux-build / Build source
  • GitHub Check: win64-build / Build source
  • GitHub Check: Lint / Run linters
🔇 Additional comments (4)
src/net.cpp (4)

341-392: LGTM! Const-correct FindNodeLocked variants enable shared-lock access.

The addition of const overloads for FindNodeLocked is the correct approach to support both shared and exclusive lock contexts. These methods properly return const CNode* for read-only access.


1820-1868: LGTM! GetEvictionCandidates correctly uses shared locking.

The method correctly uses READ_LOCK for building the eviction candidate list from read-only node data. The const qualifier appropriately reflects that this method doesn't modify CConnman state.


4833-4857: LGTM! Const ForNode overloads properly use shared locking.

The new const overloads correctly use READ_LOCK for read-only node iteration and accept const lambdas, enabling concurrent read access to nodes.


4878-4894: LGTM! NodesSnapshot correctly uses shared locking.

The constructor appropriately uses READ_LOCK for creating a read-only snapshot of the nodes list, and properly calls AddRef() to ensure node lifetime.

@DashCoreAutoGuix
Copy link

Guix Automation has completed; a release should be present here: https://github.com/dashpay/dash-dev-branches/releases/tag/v23.0.0-devpr6912.63cf92d8. The image should be on dockerhub soon.

@DashCoreAutoGuix
Copy link

Guix Automation has began to build this PR tagged as v23.0.0-devpr6912.420f6148. A new comment will be made when the image is pushed.

@DashCoreAutoGuix
Copy link

Guix Automation has completed; a release should be present here: https://github.com/dashpay/dash-dev-branches/releases/tag/v23.0.0-devpr6912.420f6148. The image should be on dockerhub soon.

@DashCoreAutoGuix
Copy link

Guix Automation has began to build this PR tagged as v23.0.0-devpr6912.c1b3047d. A new comment will be made when the image is pushed.

@DashCoreAutoGuix
Copy link

Guix Automation has completed; a release should be present here: https://github.com/dashpay/dash-dev-branches/releases/tag/v23.0.0-devpr6912.c1b3047d. The image should be on dockerhub soon.

@PastaPastaPasta PastaPastaPasta changed the title perf: convert m_nodes_mutex from recursive mutex to mutex perf: convert m_nodes_mutex from recursive mutex to shared mutex Oct 23, 2025
@UdjinM6
Copy link

UdjinM6 commented Oct 29, 2025

pls see 20aedd4

Major Improvements ✅

1. Simplified FindNode API

// REMOVED the complex FindNodeLocked/FindNodeLockedMutable distinction
- const CNode* FindNodeLocked(...) const SHARED_LOCKS_REQUIRED(m_nodes_mutex);
- CNode* FindNodeLockedMutable(...) EXCLUSIVE_LOCKS_REQUIRED(m_nodes_mutex);

// REPLACED with simple template-based approach
template<typename Key>
const CNode* FindNode(const Key& key, ...) const SHARED_LOCKS_REQUIRED(m_nodes_mutex);

template<typename Key>
CNode* FindNodeMutable(const Key& key, ...) SHARED_LOCKS_REQUIRED(m_nodes_mutex);

Analysis: ✅ Much better!
- Generic template handles all key types (CNetAddr, CService, NodeId, string)
- Both require only SHARED_LOCKS_REQUIRED - correctly reflects that finding in the vector only needs shared lock
- Mutable vs const is about pointer type, not lock type
- Eliminates ~80 lines of duplicate code

2. Fixed getPendingProbes Lock Annotation

// BEFORE: Incorrectly required m_nodes_mutex
- const auto getPendingProbes = [&]() EXCLUSIVE_LOCKS_REQUIRED(m_nodes_mutex, cs_vPendingMasternodes) {
-     AssertLockHeld(m_nodes_mutex);

// AFTER: Only requires cs_vPendingMasternodes (correct!)
+ const auto getPendingProbes = [&]() EXCLUSIVE_LOCKS_REQUIRED(cs_vPendingMasternodes) {

Analysis: ✅ Correct! This fixes the issue we identified - getPendingProbes doesn't access m_nodes.

3. Fixed getPendingQuorumNodes Lock Annotation

// BEFORE: Required exclusive lock
- const auto getPendingQuorumNodes = [&]() EXCLUSIVE_LOCKS_REQUIRED(m_nodes_mutex, cs_vPendingMasternodes) {
-     AssertLockHeld(m_nodes_mutex);

// AFTER: Only requires shared lock (correct!)
+ const auto getPendingQuorumNodes = [&]() SHARED_LOCKS_REQUIRED(m_nodes_mutex)
+                                          EXCLUSIVE_LOCKS_REQUIRED(cs_vPendingMasternodes) {
+     AssertSharedLockHeld(m_nodes_mutex);

Analysis: ✅ Correct!
- Now properly uses SHARED_LOCKS_REQUIRED since it only reads from m_nodes
- Uses AssertSharedLockHeld instead of AssertLockHeld
- Writes to pnode->fDisconnect are safe (atomic)

4. Simplified getConnectToDmn Locking

// BEFORE: Held exclusive lock
- LOCK2(m_nodes_mutex, cs_vPendingMasternodes);

// AFTER: Only holds shared lock for m_nodes
+ READ_LOCK(m_nodes_mutex);
+ LOCK(cs_vPendingMasternodes);

Analysis: ✅ Excellent optimization!
- Only needs shared lock since it's just reading from m_nodes
- Reduces contention significantly

5. Improved Logic in getPendingQuorumNodes

// BEFORE: Complex TOCTOU pattern
- if (const CNode* pnode = FindNodeLocked(addr2)) {
-     slowHandshake = pnode->nTimeFirstMessageReceived.load() != 0s && ...;
- }
- if (slowHandshake) {
-     if (CNode* p2 = FindNodeLockedMutable(addr2)) { p2->fDisconnect = true; }
- }

// AFTER: Simplified single lookup
+ CNode* pnode = FindNodeMutable(addr2);
+ if (pnode && pnode->m_masternode_connection) {
+     continue;
+ }
+ if (connectedNodes.count(addr2)) {
+     bool slow_handshake = pnode && pnode->nTimeFirstMessageReceived.load() != 0s && ...;
+     if (slow_handshake) {
+         pnode->fDisconnect = true;  // Direct access, no second lookup
+     }

Analysis: ✅ Much cleaner!
- Eliminates TOCTOU issue by using single pointer
- More readable logic flow
- Early connectedProRegTxHashes check avoids unnecessary work

6. Renamed Helper Methods for Clarity

// BEFORE: Confusing names
- WithNodeShared()
- WithNodeExclusive()
- ExistsNodeShared()

// AFTER: Clear intent
+ WithNodeMutable()  // Callback gets mutable CNode*
+ ExistsNode()       // Simple existence check

Analysis: ✅ Better naming!
- WithNodeMutable clearly indicates you get a mutable pointer
- No need for "Shared" suffix - the shared lock is an implementation detail
- The lock type is in the function signature for type safety

7. Proper AssertSharedLockHeld Implementation

// BEFORE: Hacky workaround with NO_THREAD_SAFETY_ANALYSIS
- inline void AssertSharedLockHeldInline(...) NO_THREAD_SAFETY_ANALYSIS {
-     AssertLockHeldInternal(...);
- }

// AFTER: Proper template specialization
+ template <typename MutexType>
+ void AssertSharedLockHeldInternal(...) SHARED_LOCKS_REQUIRED(cs) { ... }
+ template void AssertSharedLockHeldInternal(..., SharedMutex*);

Analysis: ✅ Much better!
- Proper template implementation
- Correct thread safety annotations
- No need for NO_THREAD_SAFETY_ANALYSIS hack

Remaining Observations

⚠️ FindNodeMutable with Shared Lock

template<typename Key>
CNode* FindNodeMutable(const Key& key, ...) SHARED_LOCKS_REQUIRED(m_nodes_mutex)

Question: This returns a mutable CNode* under shared lock. Is this intentional?

Answer: ✅ Yes, this is correct!
- The shared lock protects the m_nodes container, not the CNode objects
- CNode has internal thread safety
- Returning mutable pointer allows modifying atomic members like fDisconnect
- The "Mutable" in the name documents this is intentional

✅ IsMasternodeOrDisconnectRequested Now Const

- bool IsMasternodeOrDisconnectRequested(const CService& addr);
+ bool IsMasternodeOrDisconnectRequested(const CService& addr) const;

Analysis: ✅ Correct - it's a read-only operation.

Summary of Improvements

| Issue                                 | Status  | Notes                             |
|---------------------------------------|---------|-----------------------------------|
| Unnecessary FindNodeLocked variants   | ✅ Fixed | Replaced with clean templates     |
| getPendingProbes incorrect lock       | ✅ Fixed | Removed m_nodes_mutex requirement |
| getPendingQuorumNodes wrong lock type | ✅ Fixed | Changed to SHARED_LOCKS_REQUIRED  |
| TOCTOU in masternode connection       | ✅ Fixed | Single pointer lookup             |
| AssertSharedLockHeld hack             | ✅ Fixed | Proper template implementation    |
| Confusing method names                | ✅ Fixed | Better naming convention          |
| Over-locking in getConnectToDmn       | ✅ Fixed | Changed to READ_LOCK              |

Reviewed-by: Claude (Anthropic AI Assistant)
@PastaPastaPasta
Copy link
Member Author

Applied

@DashCoreAutoGuix
Copy link

Guix Automation has began to build this PR tagged as v23.0.0-devpr6912.9b22095c. A new comment will be made when the image is pushed.

@coderabbitai coderabbitai bot requested a review from kwvg November 4, 2025 03:21
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/net.h (1)

1787-1787: Consider addressing the TODO comment during this locking refactor.

Since this PR is already doing a major locking refactor, consider extracting m_network_conn_counts to a separate mutex as suggested in the TODO comment. This would reduce contention on m_nodes_mutex for operations that only need to check connection counts per network.

If addressing this TODO, you could:

  1. Add a new lightweight mutex specifically for m_network_conn_counts
  2. Update the GUARDED_BY annotation
  3. Update all access points to use the new mutex

This would be a natural extension of the performance improvements in this PR, though it can also be deferred to a separate change.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 420f614 and 9b22095.

📒 Files selected for processing (5)
  • src/net.cpp (33 hunks)
  • src/net.h (24 hunks)
  • src/sync.cpp (1 hunks)
  • src/sync.h (3 hunks)
  • src/test/util/net.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/sync.h
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

  • src/sync.cpp
  • src/test/util/net.h
  • src/net.cpp
  • src/net.h
src/{test,wallet/test,qt/test}/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

Unit tests for C++ code should be placed in src/test/, src/wallet/test/, or src/qt/test/ and use Boost::Test or Qt 5 for GUI tests

Files:

  • src/test/util/net.h
🧠 Learnings (11)
📓 Common learnings
Learnt from: CR
Repo: dashpay/dash PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-20T18:42:49.794Z
Learning: Applies to src/{masternode,evo}/**/*.{cpp,h,cc,cxx,hpp} : Masternode lists must use immutable data structures (Immer library) for thread safety
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks. The implementation correctly uses cs_net with GUARDED_BY annotations and EXCLUSIVE_LOCKS_REQUIRED to synchronize socket access between SendDirectly() and ReconnectThread().
📚 Learning: 2025-07-20T18:42:49.794Z
Learnt from: CR
Repo: dashpay/dash PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-20T18:42:49.794Z
Learning: Applies to src/{masternode,evo}/**/*.{cpp,h,cc,cxx,hpp} : Masternode lists must use immutable data structures (Immer library) for thread safety

Applied to files:

  • src/test/util/net.h
  • src/net.cpp
  • src/net.h
📚 Learning: 2025-09-09T21:36:11.833Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks. The implementation correctly uses cs_net with GUARDED_BY annotations and EXCLUSIVE_LOCKS_REQUIRED to synchronize socket access between SendDirectly() and ReconnectThread().

Applied to files:

  • src/net.cpp
  • src/net.h
📚 Learning: 2025-09-09T21:36:58.969Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.969Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket access operations (m_sock and m_server) should be protected by the cs_net mutex, not avoiding synchronization entirely. While lock overhead concerns are valid in general, socket operations require proper synchronization via cs_net.

Applied to files:

  • src/net.cpp
  • src/net.h
📚 Learning: 2025-07-22T14:38:45.606Z
Learnt from: PastaPastaPasta
Repo: dashpay/dash PR: 6738
File: src/evo/smldiff.h:91-94
Timestamp: 2025-07-22T14:38:45.606Z
Learning: In the Dash codebase, EXCLUSIVE_LOCKS_REQUIRED annotations are Clang thread safety annotations that are enforced at compile time on supported platforms like macOS. If callers don't properly hold the required locks, the build will fail with compile-time errors, not runtime issues.

Applied to files:

  • src/net.cpp
📚 Learning: 2025-05-10T00:54:15.597Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6629
File: src/evo/deterministicmns.cpp:473-485
Timestamp: 2025-05-10T00:54:15.597Z
Learning: In the current implementation of `NetInfoEntry`, only `CService` is a valid type and `std::monostate` represents an invalid state. Entries that don't provide a valid `CService` through `GetAddrPort()` should be rejected with an exception, not silently skipped.

Applied to files:

  • src/net.cpp
  • src/net.h
📚 Learning: 2025-09-09T21:36:11.833Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks.

Applied to files:

  • src/net.cpp
  • src/net.h
📚 Learning: 2025-09-09T21:36:31.504Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.504Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access requires synchronization using the cs_net mutex. The race condition exists between SendDirectly() (callable from any thread) and Connect() (called by ReconnectThread), where both read and write m_sock concurrently. The cs_net mutex properly coordinates these network operations.

Applied to files:

  • src/net.cpp
  • src/net.h
📚 Learning: 2025-09-09T21:36:58.969Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.969Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket operations are properly synchronized using the cs_net mutex. The m_sock and m_server members are GUARDED_BY(cs_net), and methods like Connect(), SendDirectly(), and ReconnectThread() use appropriate locking with cs_net to prevent race conditions on socket access.

Applied to files:

  • src/net.cpp
  • src/net.h
📚 Learning: 2025-09-09T21:36:31.504Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.504Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access is protected by the cs_net mutex. This mutex coordinates between the TCP send path in SendDirectly() and the reconnection operations in Connect()/Reconnect() methods, ensuring proper synchronization of socket state.

Applied to files:

  • src/net.h
📚 Learning: 2025-09-09T13:24:23.293Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:70-97
Timestamp: 2025-09-09T13:24:23.293Z
Learning: In RawSender class, m_sock and m_server members are accessed concurrently between SendDirectly() (which can be called from any thread) and Connect() (called by ReconnectThread), requiring proper synchronization even though Connect() has limited call sites.

Applied to files:

  • src/net.h
🧬 Code graph analysis (3)
src/test/util/net.h (1)
src/net.h (2)
  • m_nodes_mutex (1656-1662)
  • m_nodes_mutex (1667-1675)
src/net.cpp (1)
src/net.h (6)
  • AlreadyConnectedToAddress (1699-1753)
  • m_nodes_mutex (1656-1662)
  • m_nodes_mutex (1667-1675)
  • ExistsNode (1689-1693)
  • CopyStats (1032-1040)
  • AllNodes (1299-1299)
src/net.h (1)
src/net.cpp (55)
  • CopyStats (595-645)
  • CopyStats (595-595)
  • use_v2transport (1946-1946)
  • CheckIncomingNonce (347-355)
  • CheckIncomingNonce (347-347)
  • ForNode (4738-4743)
  • ForNode (4738-4738)
  • ForNode (4745-4750)
  • ForNode (4745-4745)
  • ForNode (4752-4757)
  • ForNode (4752-4752)
  • ForNode (4759-4764)
  • ForNode (4759-4759)
  • CNode (4611-4649)
  • addr (2862-2862)
  • addr (3314-3314)
  • IsMasternodeOrDisconnectRequested (4766-4771)
  • IsMasternodeOrDisconnectRequested (4766-4766)
  • msg (782-782)
  • GetExtraFullOutboundCount (2780-2796)
  • GetExtraFullOutboundCount (2780-2780)
  • GetExtraBlockRelayCount (2798-2810)
  • GetExtraBlockRelayCount (2798-2798)
  • AddNode (4220-4232)
  • AddNode (4220-4220)
  • GetAddedNodeInfo (3238-3296)
  • GetAddedNodeInfo (3238-3238)
  • GetNodeStats (4441-4454)
  • GetNodeStats (4441-4441)
  • DisconnectNode (4456-4463)
  • DisconnectNode (4456-4456)
  • DisconnectNode (4465-4477)
  • DisconnectNode (4465-4465)
  • DisconnectNode (4479-4482)
  • DisconnectNode (4479-4479)
  • DisconnectNode (4484-4491)
  • DisconnectNode (4484-4484)
  • MultipleManualOrFullOutboundConns (2825-2829)
  • MultipleManualOrFullOutboundConns (2825-2825)
  • ThreadMessageHandler (3607-3653)
  • ThreadMessageHandler (3607-3607)
  • AlreadyConnectedToAddress (341-345)
  • AlreadyConnectedToAddress (341-341)
  • GetEvictionCandidates (1741-1789)
  • GetEvictionCandidates (1741-1741)
  • AttemptToEvictConnection (1799-1811)
  • AttemptToEvictConnection (1799-1799)
  • ConnectNode (371-516)
  • ConnectNode (371-371)
  • SocketRecvData (2522-2577)
  • SocketRecvData (2522-2522)
  • GetCurrentBlockRelayOnlyConns (3225-3236)
  • GetCurrentBlockRelayOnlyConns (3225-3225)
  • MaybePickPreferredNetwork (2831-2845)
  • MaybePickPreferredNetwork (2831-2831)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build slim container / Build container
  • GitHub Check: Build container / Build container
🔇 Additional comments (10)
src/sync.cpp (1)

290-297: LGTM! Clean implementation of shared lock assertion.

The new AssertSharedLockHeldInternal function correctly mirrors the existing exclusive-lock assertion pattern and integrates properly with the lock-order debugging infrastructure.

src/net.h (5)

735-760: Mutable mutexes enable const-correct thread-safe methods.

Making these mutexes mutable is the correct approach to allow const methods like CopyStats (line 1032) to acquire locks for thread-safe access to internal state.


1301-1327: LGTM! Clean ForNode API with proper const-correctness.

The expanded ForNode overloads provide a clean, type-safe API with both const and non-const variants. The use of READ_LOCK internally with EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex) annotations correctly expresses the locking contract.


1333-1407: Iteration helpers correctly leverage shared locking.

The updated iteration helpers (ForEachNodeContinueIf, ForEachNode, ForEachNodeThen) consistently use READ_LOCK(m_nodes_mutex) to enable concurrent safe iteration by multiple readers. The const/non-const overload pairs provide proper const-correctness.


1521-1521: Good optimization: downgrade to shared lock for read-only query.

Changing from EXCLUSIVE_LOCKS_REQUIRED to SHARED_LOCKS_REQUIRED for MultipleManualOrFullOutboundConns is correct since it only reads m_network_conn_counts. This allows concurrent queries without blocking.


1285-1285: No recursive locking patterns identified—change is safe.

Verification confirms the RecursiveMutex → SharedMutex change at line 1285 is safe. Analysis shows:

  • SharedMutex is defined as a wrapper around std::shared_mutex, which supports concurrent readers.
  • All m_nodes_mutex acquisitions use READ_LOCK (shared access), which is fully compatible with std::shared_mutex.
  • FindNode and FindNodeMutable are annotated with SHARED_LOCKS_REQUIRED and enforce the shared lock via AssertSharedLockHeld.
  • ForNode/ForEachNode implementations acquire READ_LOCK once and invoke callbacks within that scope—no nested acquisitions.
  • Preconditions (EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex)) prevent callers from holding the lock before entry, eliminating re-entrance risk.
  • No code paths attempt to re-acquire m_nodes_mutex recursively.
src/test/util/net.h (1)

44-44: LGTM! Appropriate use of shared lock for read-only access.

Changing from LOCK (exclusive) to READ_LOCK (shared) is correct since TestNodes() only reads m_nodes to return a copy. This allows concurrent test access without blocking.

src/net.cpp (3)

341-345: Excellent const-correctness and shared-lock refactoring.

The conversion to shared locks and const-correct APIs is well-executed:

  • CNode::CopyStats made const (line 595) correctly reflects its read-only nature
  • GetEvictionCandidates (lines 1741-1789) uses READ_LOCK allowing concurrent candidate evaluation
  • New const overloads for ForNode (lines 4745-4750, 4759-4764) enable const-qualified traversal
  • AlreadyConnectedToAddress (lines 341-345) and similar methods properly use READ_LOCK for lookups

These changes align with the PR's goal of reducing lock contention (reported 86.5% reduction on m_nodes_mutex) while maintaining thread safety.

Also applies to: 595-595, 1741-1789, 4745-4750, 4759-4764


3371-3415: Shared-lock node access relies on CNode field thread-safety.

The getPendingQuorumNodes lambda (lines 3371-3415) accesses and modifies CNode fields while holding only a shared lock on m_nodes_mutex:

  • Line 3386: Reads pnode->m_masternode_connection
  • Line 3393: Reads pnode->nTimeFirstMessageReceived.load() (atomic)
  • Line 3400: Writes pnode->fDisconnect = true

This is consistent with the PR's design where the shared mutex protects the container structure and pointer lifetimes, while individual CNode fields are thread-safe (atomics or protected by their own mutexes). The use of .load() at line 3393 confirms atomics are used as stated in the PR description.


2540-2540: Remove unnecessary READ_LOCK(m_nodes_mutex) guards before CloseSocketDisconnect calls.

Lock ordering deadlocks occur when threads acquire locks in inconsistent order. However, the evidence shows these READ_LOCK calls are unnecessary:

  • Line 4117-4120 proves the pattern works without m_nodes_mutex: WITH_LOCK(m_nodes_mutex, nodes.swap(m_nodes)); followed by pnode->CloseSocketDisconnect(this); — the lock is released before calling CloseSocketDisconnect
  • CloseSocketDisconnect (line 521) only acquires cs_mapSocketToNode and m_sock_mutex, never m_nodes_mutex
  • No code path was found acquiring cs_mapSocketToNode while holding m_nodes_mutex in a way that would create circular wait with these callsites

The developer's comment // is this here for lock ordering? at all three locations (2540, 2555, 2568) indicates uncertainty. Based on the counter-example at line 4117-4120, these READ_LOCK guards should be removed.

Comment on lines +1655 to +1675
template<typename Key>
const CNode* FindNode(const Key& key, bool fExcludeDisconnecting = true) const SHARED_LOCKS_REQUIRED(m_nodes_mutex)
{
AssertSharedLockHeld(m_nodes_mutex);
for (const CNode* pnode : m_nodes) {
if (fExcludeDisconnecting && pnode->fDisconnect) continue;
if (NodeMatches(pnode, key)) return pnode;
}
return nullptr;
}

template<typename Key>
CNode* FindNodeMutable(const Key& key, bool fExcludeDisconnecting = true) SHARED_LOCKS_REQUIRED(m_nodes_mutex)
{
AssertSharedLockHeld(m_nodes_mutex);
for (CNode* pnode : m_nodes) {
if (fExcludeDisconnecting && pnode->fDisconnect) continue;
if (NodeMatches(pnode, key)) return pnode;
}
return nullptr;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Verify fExcludeDisconnecting default doesn't break DisconnectNode RPC.

The default fExcludeDisconnecting = true in both FindNode and FindNodeMutable templates could prevent DisconnectNode operations from finding nodes that are in the process of disconnecting. This was flagged in past reviews as making the RPC regressive.

Additionally, FindNodeMutable returns a mutable CNode* while only requiring SHARED_LOCKS_REQUIRED(m_nodes_mutex). This is safe only because CNode fields are internally synchronized, but this critical invariant should be documented.

#!/bin/bash
# Check if DisconnectNode implementations explicitly pass false for fExcludeDisconnecting
echo "=== Checking DisconnectNode implementations ==="
ast-grep --pattern 'DisconnectNode($$$) {
  $$$
}'

echo ""
echo "=== Checking calls to FindNode/FindNodeMutable from DisconnectNode ==="
rg -n --type=cpp -A10 'bool.*DisconnectNode' | rg '(FindNode|ExistsNode|WithNode)'

echo ""
echo "=== Checking all usages of fExcludeDisconnecting parameter ==="
rg -n --type=cpp 'fExcludeDisconnecting'
🤖 Prompt for AI Agents
In src/net.h around lines 1655-1675, the default fExcludeDisconnecting=true on
both FindNode and FindNodeMutable can cause DisconnectNode RPCs to miss nodes in
the process of disconnecting; either change the default to false or (preferably)
keep the default but update all DisconnectNode implementations to explicitly
pass false when they must match disconnecting nodes. Also add a short comment
above FindNodeMutable documenting that although it returns a mutable CNode*
while only requiring SHARED_LOCKS_REQUIRED(m_nodes_mutex), this is safe because
CNode's mutable fields are internally synchronized and that the shared-lock
annotation must be preserved.

@DashCoreAutoGuix
Copy link

Guix Automation has completed; a release should be present here: https://github.com/dashpay/dash-dev-branches/releases/tag/v23.0.0-devpr6912.9b22095c. The image should be on dockerhub soon.

Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 9b22095

Copy link
Collaborator

@kwvg kwvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 9b22095

@PastaPastaPasta PastaPastaPasta merged commit 3dd74c6 into dashpay:develop Nov 4, 2025
35 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants