-
Notifications
You must be signed in to change notification settings - Fork 1.2k
perf: convert m_nodes_mutex from recursive mutex to shared mutex #6912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: convert m_nodes_mutex from recursive mutex to shared mutex #6912
Conversation
|
|
Guix Automation has began to build this PR tagged as v23.0.0-devpr6912.37c097de. A new comment will be made when the image is pushed. |
|
Guix Automation has completed; a release should be present here: https://github.com/dashpay/dash-dev-branches/releases/tag/v23.0.0-devpr6912.37c097de. The image should be on dockerhub soon. |
|
(marked ready for review for code rabbit) |
|
Guix Automation has began to build this PR tagged as v23.0.0-devpr6912.63cf92d8. A new comment will be made when the image is pushed. |
WalkthroughRefactors network code for const-correctness and a shared-lock model. m_nodes_mutex changed from RecursiveMutex to SharedMutex; several mutexes made mutable. Added AssertSharedLockHeld and shared-lock assertions. Introduced const-safe node lookup helpers (FindNodeLocked variants, ExistsNodeShared, WithNodeShared, WithNodeExclusive), new ForNode const overloads, and CNode::CopyStats made const. Eviction logic centralized via GetEvictionCandidates and AttemptToEvictConnection. Threaded connection paths and masternode connection code updated to use the new lock-aware APIs. Enforces a hard cap of three high-bandwidth peers announcing headers. Sequence Diagram(s)mermaid Caller->>CConnman: WithNodeShared(key, cond, func) mermaid EvictCaller->>CConnman: AttemptToEvictConnection() Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Potential areas needing extra attention:
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
src/net.cpp (1)
3464-3469: Misleading comment about lock access level.The comment states "Use shared access since we already hold m_nodes_mutex", but at this point the code holds an exclusive lock (via
LOCK2at line 3526, called beforegetPendingQuorumNodesat line 3547). While the code is correct—FindNodeLocked(const version) can be called with either shared or exclusive lock—the comment is confusing.Clarify or remove the comment:
- // Use shared access since we already hold m_nodes_mutex bool slowHandshake = false; if (const CNode* pnode = FindNodeLocked(addr2)) {src/net.h (1)
1658-1662: optional<invoke_result_t<...>> can fail for reference results; decay the typeIf a callable returns a reference (or cv/ref-qualified type), std::optional<invoke_result_t<...>> is ill-formed. Decay the result type before wrapping it in optional.
Apply this diff to both helpers:
template<typename Key, typename Callable> -std::optional<std::invoke_result_t<Callable, const CNode*>> WithNodeShared(const Key& key, Callable&& fn) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex) +std::optional<std::decay_t<std::invoke_result_t<Callable, const CNode*>>> WithNodeShared(const Key& key, Callable&& fn) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex) { READ_LOCK(m_nodes_mutex); - if (const CNode* p = FindNodeLockedBy(key)) return std::optional<std::invoke_result_t<Callable, const CNode*>>{fn(p)}; + if (const CNode* p = FindNodeLockedBy(key)) return std::optional<std::decay_t<std::invoke_result_t<Callable, const CNode*>>>{fn(p)}; return std::nullopt; } template<typename Key, typename Callable> -std::optional<std::invoke_result_t<Callable, CNode*>> WithNodeExclusive(const Key& key, Callable&& fn) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex) +std::optional<std::decay_t<std::invoke_result_t<Callable, CNode*>>> WithNodeExclusive(const Key& key, Callable&& fn) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex) { LOCK(m_nodes_mutex); - if (const CNode* cp = FindNodeLockedBy(key)) return std::optional<std::invoke_result_t<Callable, CNode*>>{fn(const_cast<CNode*>(cp))}; + if (const CNode* cp = FindNodeLockedBy(key)) return std::optional<std::decay_t<std::invoke_result_t<Callable, CNode*>>>{fn(const_cast<CNode*>(cp))}; return std::nullopt; }No additional includes needed; <type_traits> is already added.
Also applies to: 1663-1668, 1669-1686, 1687-1695, 1697-1705, 1708-1713, 1715-1721
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/net.cpp(36 hunks)src/net.h(26 hunks)src/net_processing.cpp(1 hunks)src/sync.h(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction
Files:
src/sync.hsrc/net_processing.cppsrc/net.cppsrc/net.h
🧠 Learnings (6)
📚 Learning: 2025-09-09T21:36:58.969Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.969Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket operations are properly synchronized using the cs_net mutex. The m_sock and m_server members are GUARDED_BY(cs_net), and methods like Connect(), SendDirectly(), and ReconnectThread() use appropriate locking with cs_net to prevent race conditions on socket access.
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:11.833Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks. The implementation correctly uses cs_net with GUARDED_BY annotations and EXCLUSIVE_LOCKS_REQUIRED to synchronize socket access between SendDirectly() and ReconnectThread().
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:31.504Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.504Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access is protected by the cs_net mutex. This mutex coordinates between the TCP send path in SendDirectly() and the reconnection operations in Connect()/Reconnect() methods, ensuring proper synchronization of socket state.
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:31.504Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.504Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access requires synchronization using the cs_net mutex. The race condition exists between SendDirectly() (callable from any thread) and Connect() (called by ReconnectThread), where both read and write m_sock concurrently. The cs_net mutex properly coordinates these network operations.
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:11.833Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks.
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:58.969Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.969Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket access operations (m_sock and m_server) should be protected by the cs_net mutex, not avoiding synchronization entirely. While lock overhead concerns are valid in general, socket operations require proper synchronization via cs_net.
Applied to files:
src/net.h
🧬 Code graph analysis (3)
src/sync.h (1)
src/sync.cpp (5)
AssertLockHeldInternal(280-285)AssertLockHeldInternal(280-280)AssertLockHeldInternal(286-286)AssertLockHeldInternal(287-287)AssertLockHeldInternal(288-288)
src/net.cpp (2)
src/node/interfaces.cpp (12)
ip(496-503)ip(496-496)id(511-517)id(511-511)stats(453-479)stats(453-453)LOCK(541-545)LOCK(551-558)LOCK(559-566)LOCK(828-837)LOCK(871-875)LOCK(1055-1059)src/net.h (3)
m_nodes_mutex(1688-1693)CopyStats(1032-1040)string(107-282)
src/net.h (1)
src/net.cpp (54)
CopyStats(674-724)CopyStats(674-674)StopNodes(4177-4228)StopNodes(4177-4177)use_v2transport(2025-2025)CheckIncomingNonce(425-433)CheckIncomingNonce(425-425)ForNode(4822-4833)ForNode(4822-4822)ForNode(4835-4846)ForNode(4835-4835)ForNode(4848-4859)ForNode(4848-4848)ForNode(4861-4872)ForNode(4861-4861)CNode(4693-4731)addr(2941-2941)addr(3393-3393)AddNode(4298-4310)AddNode(4298-4298)GetAddedNodeInfo(3317-3375)GetAddedNodeInfo(3317-3317)GetNodeStats(4519-4532)GetNodeStats(4519-4519)FindNodeLocked(342-353)FindNodeLocked(342-342)FindNodeLocked(355-366)FindNodeLocked(355-355)FindNodeLocked(368-379)FindNodeLocked(368-368)FindNodeLocked(381-392)FindNodeLocked(381-381)FindNodeLockedMutable(394-398)FindNodeLockedMutable(394-394)FindNodeLockedMutable(400-404)FindNodeLockedMutable(400-400)FindNodeLockedMutable(406-410)FindNodeLockedMutable(406-406)FindNodeLockedMutable(412-416)FindNodeLockedMutable(412-412)AlreadyConnectedToAddress(419-423)AlreadyConnectedToAddress(419-419)GetEvictionCandidates(1820-1868)GetEvictionCandidates(1820-1820)AttemptToEvictConnection(1878-1890)AttemptToEvictConnection(1878-1878)ConnectNode(449-595)ConnectNode(449-449)SocketRecvData(2601-2656)SocketRecvData(2601-2601)GetCurrentBlockRelayOnlyConns(3304-3315)GetCurrentBlockRelayOnlyConns(3304-3304)MaybePickPreferredNetwork(2910-2924)MaybePickPreferredNetwork(2910-2910)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: arm-linux-build / Build source
- GitHub Check: mac-build / Build source
- GitHub Check: linux64_tsan-build / Build source
- GitHub Check: linux64_sqlite-build / Build source
- GitHub Check: linux64_fuzz-build / Build source
- GitHub Check: linux64_ubsan-build / Build source
- GitHub Check: linux64-build / Build source
- GitHub Check: win64-build / Build source
- GitHub Check: Lint / Run linters
🔇 Additional comments (13)
src/net_processing.cpp (1)
1308-1318: Good call keeping SENDCMPCT fan-out bounded.Capping
lNodesAnnouncingHeaderAndIDsat three entries—and explicitly telling the oldest peer to drop out of high-bandwidth mode—brings us back in line with the BIP152 limit without upsetting the existing outbound-protection logic above. Looks solid to me.src/sync.h (1)
164-167: LGTM: Shared lock assertion helper implemented correctly.The new
AssertSharedLockHeldInlinefunction andAssertSharedLockHeldmacro provide the necessary infrastructure for asserting shared lock ownership onSharedMutex. TheNO_THREAD_SAFETY_ANALYSISannotation is correctly applied since the underlyingAssertLockHeldInternalrequires exclusive lock annotation but we're checking shared lock at runtime.src/net.cpp (6)
341-392: LGTM: Const-correct FindNodeLocked methods.The const-qualified
FindNodeLockedoverloads correctly useconst CNode*and enable read-only node lookups under either shared or exclusive locks. The newNodeIdoverload at lines 381-392 is a useful addition.
419-433: LGTM: Proper const-correctness and read-lock usage.Both
AlreadyConnectedToAddressandCheckIncomingNoncecorrectly useREAD_LOCKfor read-only node lookups and are appropriately const-qualified.
1820-1868: LGTM: GetEvictionCandidates properly encapsulated.The method correctly uses
READ_LOCKfor read-only node access and returns a value (not references), eliminating lifetime concerns. Good encapsulation of the eviction candidate collection logic.
4835-4859: LGTM: Const-correct ForNode overloads.The new const
ForNodeoverloads forCServiceandNodeIdcorrectly useREAD_LOCKand accept const functors, enabling read-only node operations without exclusive lock contention. The pattern complements the existing mutable overloads well.
674-674: LGTM: CopyStats correctly made const.Making
CopyStatsconst is appropriate since it only reads node state without modification, enabling use in read-only contexts.
421-421: Good read/write lock separation throughout.The systematic conversion to
READ_LOCKfor read-only node access while retaining exclusive locks for modifications is well-executed. This enables concurrent reads and reduces contention, which is the primary benefit of theRecursiveMutextoSharedMutexconversion.src/net.h (5)
51-51: Includes and forward-declare: LGTMtype_traits is needed for invoke_result_t below; forward-declaring NodeEvictionCandidate here is appropriate for the declarations that return vectors.
Also applies to: 65-65
735-735: Const-correctness via mutable locks and CopyStats: LGTMMaking cs_vSend, cs_vRecv, and m_subver_mutex mutable enables const methods like CopyStats() to lock internals safely. CopyStats() const with negative lock requirements matches the net.cpp implementation.
Also applies to: 737-737, 761-761, 1032-1032
1301-1304: SharedMutex migration and API annotations: LGTMForNode/ForEach*/ForEach*Then now explicitly require the nodes lock not to be held and take shared/exclusive locks internally. Const variants use READ_LOCK; non-const variants use LOCK. This is the right direction for removing recursive locking.
Also applies to: 1307-1317, 1333-1342, 1350-1359, 1366-1375, 1381-1395, 1397-1406, 1413-1423, 1425-1429
1811-1811: m_nodes_mutex as SharedMutex and SHARED_LOCKS_REQUIRED usage: LGTMDeclaring m_nodes_mutex as SharedMutex and introducing SHARED_LOCKS_REQUIRED for read-only paths (e.g., MultipleManualOrFullOutboundConns) aligns with the non-recursive, shared/exclusive model.
Also applies to: 1543-1543
1729-1732: Public APIs with explicit node-lock contracts: LGTMGetEvictionCandidates()/AttemptToEvictConnection(), GetCurrentBlockRelayOnlyConns(), MaybePickPreferredNetwork(), GetExtra* counts, GetNodeStats(), and DisconnectNode* are all updated to declare !m_nodes_mutex. Matches the corresponding READ/LOCK usage in net.cpp.
Also applies to: 1758-1758, 1770-1770, 1462-1464, 1504-1508
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/net.cpp(36 hunks)src/net.h(26 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction
Files:
src/net.hsrc/net.cpp
🧠 Learnings (6)
📚 Learning: 2025-09-09T21:36:11.833Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks. The implementation correctly uses cs_net with GUARDED_BY annotations and EXCLUSIVE_LOCKS_REQUIRED to synchronize socket access between SendDirectly() and ReconnectThread().
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:58.969Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.969Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket operations are properly synchronized using the cs_net mutex. The m_sock and m_server members are GUARDED_BY(cs_net), and methods like Connect(), SendDirectly(), and ReconnectThread() use appropriate locking with cs_net to prevent race conditions on socket access.
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:31.504Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.504Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access is protected by the cs_net mutex. This mutex coordinates between the TCP send path in SendDirectly() and the reconnection operations in Connect()/Reconnect() methods, ensuring proper synchronization of socket state.
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:31.504Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.504Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access requires synchronization using the cs_net mutex. The race condition exists between SendDirectly() (callable from any thread) and Connect() (called by ReconnectThread), where both read and write m_sock concurrently. The cs_net mutex properly coordinates these network operations.
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:11.833Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks.
Applied to files:
src/net.h
📚 Learning: 2025-09-09T21:36:58.969Z
Learnt from: kwvg
PR: dashpay/dash#6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.969Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket access operations (m_sock and m_server) should be protected by the cs_net mutex, not avoiding synchronization entirely. While lock overhead concerns are valid in general, socket operations require proper synchronization via cs_net.
Applied to files:
src/net.h
🧬 Code graph analysis (2)
src/net.h (1)
src/net.cpp (62)
CopyStats(674-724)CopyStats(674-674)StopNodes(4177-4228)StopNodes(4177-4177)use_v2transport(2025-2025)ForNode(4822-4833)ForNode(4822-4822)ForNode(4835-4846)ForNode(4835-4835)ForNode(4848-4859)ForNode(4848-4848)ForNode(4861-4872)ForNode(4861-4861)CNode(4693-4731)addr(2941-2941)addr(3393-3393)GetExtraFullOutboundCount(2859-2875)GetExtraFullOutboundCount(2859-2859)GetExtraBlockRelayCount(2877-2889)GetExtraBlockRelayCount(2877-2877)GetNodeStats(4519-4532)GetNodeStats(4519-4519)DisconnectNode(4534-4541)DisconnectNode(4534-4534)DisconnectNode(4543-4555)DisconnectNode(4543-4543)DisconnectNode(4557-4560)DisconnectNode(4557-4557)DisconnectNode(4562-4573)DisconnectNode(4562-4562)MultipleManualOrFullOutboundConns(2904-2908)MultipleManualOrFullOutboundConns(2904-2904)FindNodeLocked(342-353)FindNodeLocked(342-342)FindNodeLocked(355-366)FindNodeLocked(355-355)FindNodeLocked(368-379)FindNodeLocked(368-368)FindNodeLocked(381-392)FindNodeLocked(381-381)FindNodeLockedMutable(394-398)FindNodeLockedMutable(394-394)FindNodeLockedMutable(400-404)FindNodeLockedMutable(400-400)FindNodeLockedMutable(406-410)FindNodeLockedMutable(406-406)FindNodeLockedMutable(412-416)FindNodeLockedMutable(412-412)GetEvictionCandidates(1820-1868)GetEvictionCandidates(1820-1820)AttemptToEvictConnection(1878-1890)AttemptToEvictConnection(1878-1878)ConnectNode(449-595)ConnectNode(449-449)SocketRecvData(2601-2656)SocketRecvData(2601-2601)GetCurrentBlockRelayOnlyConns(3304-3315)GetCurrentBlockRelayOnlyConns(3304-3304)MaybePickPreferredNetwork(2910-2924)MaybePickPreferredNetwork(2910-2910)PerformReconnections(4917-4944)PerformReconnections(4917-4917)
src/net.cpp (2)
src/node/interfaces.cpp (12)
ip(496-503)ip(496-496)id(511-517)id(511-511)stats(453-479)stats(453-453)LOCK(541-545)LOCK(551-558)LOCK(559-566)LOCK(828-837)LOCK(871-875)LOCK(1055-1059)src/net.h (4)
m_nodes_mutex(1688-1693)CopyStats(1032-1040)AllNodes(1299-1299)string(107-282)
src/net.h
Outdated
| const CNode* FindNodeLocked(const CNetAddr& ip, bool fExcludeDisconnecting = true) const SHARED_LOCKS_REQUIRED(m_nodes_mutex); | ||
| const CNode* FindNodeLocked(const std::string& addrName, bool fExcludeDisconnecting = true) const SHARED_LOCKS_REQUIRED(m_nodes_mutex); | ||
| const CNode* FindNodeLocked(const CService& addr, bool fExcludeDisconnecting = true) const SHARED_LOCKS_REQUIRED(m_nodes_mutex); | ||
| const CNode* FindNodeLocked(NodeId id, bool fExcludeDisconnecting = true) const SHARED_LOCKS_REQUIRED(m_nodes_mutex); | ||
|
|
||
| // Mutable find helpers for callers that already hold the exclusive lock | ||
| CNode* FindNodeLockedMutable(const CNetAddr& ip, bool fExcludeDisconnecting = true) EXCLUSIVE_LOCKS_REQUIRED(m_nodes_mutex); | ||
| CNode* FindNodeLockedMutable(const std::string& addrName, bool fExcludeDisconnecting = true) EXCLUSIVE_LOCKS_REQUIRED(m_nodes_mutex); | ||
| CNode* FindNodeLockedMutable(const CService& addr, bool fExcludeDisconnecting = true) EXCLUSIVE_LOCKS_REQUIRED(m_nodes_mutex); | ||
| CNode* FindNodeLockedMutable(NodeId id, bool fExcludeDisconnecting = true) EXCLUSIVE_LOCKS_REQUIRED(m_nodes_mutex); | ||
|
|
||
| // Type-agnostic node matching helpers | ||
| static inline bool NodeMatches(const CNode* p, const CService& addr) | ||
| { | ||
| return static_cast<CService>(p->addr) == addr; | ||
| } | ||
| static inline bool NodeMatches(const CNode* p, const CNetAddr& ip) | ||
| { | ||
| return static_cast<CNetAddr>(p->addr) == ip; | ||
| } | ||
| static inline bool NodeMatches(const CNode* p, const std::string& addrName) | ||
| { | ||
| return p->m_addr_name == addrName; | ||
| } | ||
| static inline bool NodeMatches(const CNode* p, const NodeId id) | ||
| { | ||
| return p->GetId() == id; | ||
| } | ||
|
|
||
| template<typename Key> | ||
| const CNode* FindNodeLockedBy(const Key& key, bool fExcludeDisconnecting = true) const SHARED_LOCKS_REQUIRED(m_nodes_mutex) | ||
| { | ||
| for (const CNode* pnode : m_nodes) { | ||
| if (fExcludeDisconnecting && pnode->fDisconnect) continue; | ||
| if (NodeMatches(pnode, key)) return pnode; | ||
| } | ||
| return nullptr; | ||
| } | ||
|
|
||
| // Callback helpers with explicit lock semantics (templated on key type) | ||
| // Lambda-based shared accessor returning optional result (nullopt = not found) | ||
| template<typename Key, typename Callable> | ||
| std::optional<std::invoke_result_t<Callable, const CNode*>> WithNodeShared(const Key& key, Callable&& fn) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex) | ||
| { | ||
| READ_LOCK(m_nodes_mutex); | ||
| if (const CNode* p = FindNodeLockedBy(key)) return std::optional<std::invoke_result_t<Callable, const CNode*>>{fn(p)}; | ||
| return std::nullopt; | ||
| } | ||
|
|
||
| // Fast existence check under shared lock | ||
| template<typename Key> | ||
| bool ExistsNodeShared(const Key& key) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex) | ||
| { | ||
| READ_LOCK(m_nodes_mutex); | ||
| return FindNodeLockedBy(key) != nullptr; | ||
| } | ||
|
|
||
| template<typename Key, typename Callable> | ||
| std::optional<std::invoke_result_t<Callable, CNode*>> WithNodeExclusive(const Key& key, Callable&& fn) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex) | ||
| { | ||
| LOCK(m_nodes_mutex); | ||
| if (const CNode* cp = FindNodeLockedBy(key)) return std::optional<std::invoke_result_t<Callable, CNode*>>{fn(const_cast<CNode*>(cp))}; | ||
| return std::nullopt; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep find helpers visible to disconnect-node callers
With the new helpers we now default fExcludeDisconnecting = true (and WithNodeExclusive forwards that default). As soon as a peer has been marked fDisconnect = true, FindNodeLockedBy(...) stops finding it, so DisconnectNode(...) returns false while the socket is still being torn down. Previously the RPC would keep reporting success until the peer was actually removed. This makes the disconnectnode RPC racy/regressive. Please include disconnecting peers when servicing these helpers (e.g. restore the old default or explicitly pass false from WithNodeExclusive) so callers can still find the node until it is gone.
🤖 Prompt for AI Agents
In src/net.h around lines 1658-1721, the new find helpers default
fExcludeDisconnecting = true which causes disconnecting peers to be ignored and
breaks disconnectnode; restore behavior to include disconnecting peers by
changing the default to fExcludeDisconnecting = false in the FindNodeLockedBy
template signature (and any other FindNodeLocked* helpers if applicable), or
alternately ensure WithNodeShared, ExistsNodeShared and WithNodeExclusive call
FindNodeLockedBy(key, /*fExcludeDisconnecting=*/false) so disconnecting peers
are still found until removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/net.cpp (2)
494-500: Consider using WithNodeShared here for consistency and better concurrency.This check for an existing connection uses
WithNodeExclusive, but the lambda doesn't modify the node. For consistency with the similar check at line 460 (which usesWithNodeShared) and to allow better concurrency, consider changing to:-if (WithNodeExclusive(static_cast<CService>(addrConnect), [&](CNode* /*pnode*/){ return true; })) { +if (WithNodeShared(static_cast<CService>(addrConnect), [](const CNode*){ return true; })) {Note: Also removes unnecessary
[&]capture.
3464-3464: Clarify comment to avoid confusion with shared vs. exclusive locks.The comment "Use shared access since we already hold m_nodes_mutex" could be misinterpreted in the context of this PR's shared/exclusive locking model. Consider clarifying:
-// Use shared access since we already hold m_nodes_mutex +// Use const access (FindNodeLocked) since we already hold m_nodes_mutexThis makes it clear that "shared access" refers to const/read-only access, not a shared lock.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/net.cpp(35 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction
Files:
src/net.cpp
🧬 Code graph analysis (1)
src/net.cpp (1)
src/net.h (4)
m_nodes_mutex(1688-1693)CopyStats(1032-1040)AllNodes(1299-1299)string(107-282)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: linux64_nowallet-build / Build source
- GitHub Check: linux64_ubsan-build / Build source
- GitHub Check: linux64_sqlite-build / Build source
- GitHub Check: linux64_fuzz-build / Build source
- GitHub Check: linux64-build / Build source
- GitHub Check: arm-linux-build / Build source
- GitHub Check: win64-build / Build source
- GitHub Check: Lint / Run linters
🔇 Additional comments (4)
src/net.cpp (4)
341-392: LGTM! Const-correct FindNodeLocked variants enable shared-lock access.The addition of const overloads for
FindNodeLockedis the correct approach to support both shared and exclusive lock contexts. These methods properly returnconst CNode*for read-only access.
1820-1868: LGTM! GetEvictionCandidates correctly uses shared locking.The method correctly uses
READ_LOCKfor building the eviction candidate list from read-only node data. The const qualifier appropriately reflects that this method doesn't modify CConnman state.
4833-4857: LGTM! Const ForNode overloads properly use shared locking.The new const overloads correctly use
READ_LOCKfor read-only node iteration and accept const lambdas, enabling concurrent read access to nodes.
4878-4894: LGTM! NodesSnapshot correctly uses shared locking.The constructor appropriately uses
READ_LOCKfor creating a read-only snapshot of the nodes list, and properly callsAddRef()to ensure node lifetime.
|
Guix Automation has completed; a release should be present here: https://github.com/dashpay/dash-dev-branches/releases/tag/v23.0.0-devpr6912.63cf92d8. The image should be on dockerhub soon. |
|
Guix Automation has began to build this PR tagged as v23.0.0-devpr6912.420f6148. A new comment will be made when the image is pushed. |
|
Guix Automation has completed; a release should be present here: https://github.com/dashpay/dash-dev-branches/releases/tag/v23.0.0-devpr6912.420f6148. The image should be on dockerhub soon. |
…des; CNode is internally thread safe
|
Guix Automation has began to build this PR tagged as v23.0.0-devpr6912.c1b3047d. A new comment will be made when the image is pushed. |
|
Guix Automation has completed; a release should be present here: https://github.com/dashpay/dash-dev-branches/releases/tag/v23.0.0-devpr6912.c1b3047d. The image should be on dockerhub soon. |
|
pls see 20aedd4 |
Major Improvements ✅
1. Simplified FindNode API
// REMOVED the complex FindNodeLocked/FindNodeLockedMutable distinction
- const CNode* FindNodeLocked(...) const SHARED_LOCKS_REQUIRED(m_nodes_mutex);
- CNode* FindNodeLockedMutable(...) EXCLUSIVE_LOCKS_REQUIRED(m_nodes_mutex);
// REPLACED with simple template-based approach
template<typename Key>
const CNode* FindNode(const Key& key, ...) const SHARED_LOCKS_REQUIRED(m_nodes_mutex);
template<typename Key>
CNode* FindNodeMutable(const Key& key, ...) SHARED_LOCKS_REQUIRED(m_nodes_mutex);
Analysis: ✅ Much better!
- Generic template handles all key types (CNetAddr, CService, NodeId, string)
- Both require only SHARED_LOCKS_REQUIRED - correctly reflects that finding in the vector only needs shared lock
- Mutable vs const is about pointer type, not lock type
- Eliminates ~80 lines of duplicate code
2. Fixed getPendingProbes Lock Annotation
// BEFORE: Incorrectly required m_nodes_mutex
- const auto getPendingProbes = [&]() EXCLUSIVE_LOCKS_REQUIRED(m_nodes_mutex, cs_vPendingMasternodes) {
- AssertLockHeld(m_nodes_mutex);
// AFTER: Only requires cs_vPendingMasternodes (correct!)
+ const auto getPendingProbes = [&]() EXCLUSIVE_LOCKS_REQUIRED(cs_vPendingMasternodes) {
Analysis: ✅ Correct! This fixes the issue we identified - getPendingProbes doesn't access m_nodes.
3. Fixed getPendingQuorumNodes Lock Annotation
// BEFORE: Required exclusive lock
- const auto getPendingQuorumNodes = [&]() EXCLUSIVE_LOCKS_REQUIRED(m_nodes_mutex, cs_vPendingMasternodes) {
- AssertLockHeld(m_nodes_mutex);
// AFTER: Only requires shared lock (correct!)
+ const auto getPendingQuorumNodes = [&]() SHARED_LOCKS_REQUIRED(m_nodes_mutex)
+ EXCLUSIVE_LOCKS_REQUIRED(cs_vPendingMasternodes) {
+ AssertSharedLockHeld(m_nodes_mutex);
Analysis: ✅ Correct!
- Now properly uses SHARED_LOCKS_REQUIRED since it only reads from m_nodes
- Uses AssertSharedLockHeld instead of AssertLockHeld
- Writes to pnode->fDisconnect are safe (atomic)
4. Simplified getConnectToDmn Locking
// BEFORE: Held exclusive lock
- LOCK2(m_nodes_mutex, cs_vPendingMasternodes);
// AFTER: Only holds shared lock for m_nodes
+ READ_LOCK(m_nodes_mutex);
+ LOCK(cs_vPendingMasternodes);
Analysis: ✅ Excellent optimization!
- Only needs shared lock since it's just reading from m_nodes
- Reduces contention significantly
5. Improved Logic in getPendingQuorumNodes
// BEFORE: Complex TOCTOU pattern
- if (const CNode* pnode = FindNodeLocked(addr2)) {
- slowHandshake = pnode->nTimeFirstMessageReceived.load() != 0s && ...;
- }
- if (slowHandshake) {
- if (CNode* p2 = FindNodeLockedMutable(addr2)) { p2->fDisconnect = true; }
- }
// AFTER: Simplified single lookup
+ CNode* pnode = FindNodeMutable(addr2);
+ if (pnode && pnode->m_masternode_connection) {
+ continue;
+ }
+ if (connectedNodes.count(addr2)) {
+ bool slow_handshake = pnode && pnode->nTimeFirstMessageReceived.load() != 0s && ...;
+ if (slow_handshake) {
+ pnode->fDisconnect = true; // Direct access, no second lookup
+ }
Analysis: ✅ Much cleaner!
- Eliminates TOCTOU issue by using single pointer
- More readable logic flow
- Early connectedProRegTxHashes check avoids unnecessary work
6. Renamed Helper Methods for Clarity
// BEFORE: Confusing names
- WithNodeShared()
- WithNodeExclusive()
- ExistsNodeShared()
// AFTER: Clear intent
+ WithNodeMutable() // Callback gets mutable CNode*
+ ExistsNode() // Simple existence check
Analysis: ✅ Better naming!
- WithNodeMutable clearly indicates you get a mutable pointer
- No need for "Shared" suffix - the shared lock is an implementation detail
- The lock type is in the function signature for type safety
7. Proper AssertSharedLockHeld Implementation
// BEFORE: Hacky workaround with NO_THREAD_SAFETY_ANALYSIS
- inline void AssertSharedLockHeldInline(...) NO_THREAD_SAFETY_ANALYSIS {
- AssertLockHeldInternal(...);
- }
// AFTER: Proper template specialization
+ template <typename MutexType>
+ void AssertSharedLockHeldInternal(...) SHARED_LOCKS_REQUIRED(cs) { ... }
+ template void AssertSharedLockHeldInternal(..., SharedMutex*);
Analysis: ✅ Much better!
- Proper template implementation
- Correct thread safety annotations
- No need for NO_THREAD_SAFETY_ANALYSIS hack
Remaining Observations
⚠️ FindNodeMutable with Shared Lock
template<typename Key>
CNode* FindNodeMutable(const Key& key, ...) SHARED_LOCKS_REQUIRED(m_nodes_mutex)
Question: This returns a mutable CNode* under shared lock. Is this intentional?
Answer: ✅ Yes, this is correct!
- The shared lock protects the m_nodes container, not the CNode objects
- CNode has internal thread safety
- Returning mutable pointer allows modifying atomic members like fDisconnect
- The "Mutable" in the name documents this is intentional
✅ IsMasternodeOrDisconnectRequested Now Const
- bool IsMasternodeOrDisconnectRequested(const CService& addr);
+ bool IsMasternodeOrDisconnectRequested(const CService& addr) const;
Analysis: ✅ Correct - it's a read-only operation.
Summary of Improvements
| Issue | Status | Notes |
|---------------------------------------|---------|-----------------------------------|
| Unnecessary FindNodeLocked variants | ✅ Fixed | Replaced with clean templates |
| getPendingProbes incorrect lock | ✅ Fixed | Removed m_nodes_mutex requirement |
| getPendingQuorumNodes wrong lock type | ✅ Fixed | Changed to SHARED_LOCKS_REQUIRED |
| TOCTOU in masternode connection | ✅ Fixed | Single pointer lookup |
| AssertSharedLockHeld hack | ✅ Fixed | Proper template implementation |
| Confusing method names | ✅ Fixed | Better naming convention |
| Over-locking in getConnectToDmn | ✅ Fixed | Changed to READ_LOCK |
Reviewed-by: Claude (Anthropic AI Assistant)
|
Applied |
|
Guix Automation has began to build this PR tagged as v23.0.0-devpr6912.9b22095c. A new comment will be made when the image is pushed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/net.h (1)
1787-1787: Consider addressing the TODO comment during this locking refactor.Since this PR is already doing a major locking refactor, consider extracting
m_network_conn_countsto a separate mutex as suggested in the TODO comment. This would reduce contention onm_nodes_mutexfor operations that only need to check connection counts per network.If addressing this TODO, you could:
- Add a new lightweight mutex specifically for
m_network_conn_counts- Update the
GUARDED_BYannotation- Update all access points to use the new mutex
This would be a natural extension of the performance improvements in this PR, though it can also be deferred to a separate change.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/net.cpp(33 hunks)src/net.h(24 hunks)src/sync.cpp(1 hunks)src/sync.h(3 hunks)src/test/util/net.h(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/sync.h
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.{cpp,h,cc,cxx,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction
Files:
src/sync.cppsrc/test/util/net.hsrc/net.cppsrc/net.h
src/{test,wallet/test,qt/test}/**/*.{cpp,h,cc,cxx,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
Unit tests for C++ code should be placed in src/test/, src/wallet/test/, or src/qt/test/ and use Boost::Test or Qt 5 for GUI tests
Files:
src/test/util/net.h
🧠 Learnings (11)
📓 Common learnings
Learnt from: CR
Repo: dashpay/dash PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-20T18:42:49.794Z
Learning: Applies to src/{masternode,evo}/**/*.{cpp,h,cc,cxx,hpp} : Masternode lists must use immutable data structures (Immer library) for thread safety
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks. The implementation correctly uses cs_net with GUARDED_BY annotations and EXCLUSIVE_LOCKS_REQUIRED to synchronize socket access between SendDirectly() and ReconnectThread().
📚 Learning: 2025-07-20T18:42:49.794Z
Learnt from: CR
Repo: dashpay/dash PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-07-20T18:42:49.794Z
Learning: Applies to src/{masternode,evo}/**/*.{cpp,h,cc,cxx,hpp} : Masternode lists must use immutable data structures (Immer library) for thread safety
Applied to files:
src/test/util/net.hsrc/net.cppsrc/net.h
📚 Learning: 2025-09-09T21:36:11.833Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks. The implementation correctly uses cs_net with GUARDED_BY annotations and EXCLUSIVE_LOCKS_REQUIRED to synchronize socket access between SendDirectly() and ReconnectThread().
Applied to files:
src/net.cppsrc/net.h
📚 Learning: 2025-09-09T21:36:58.969Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.969Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket access operations (m_sock and m_server) should be protected by the cs_net mutex, not avoiding synchronization entirely. While lock overhead concerns are valid in general, socket operations require proper synchronization via cs_net.
Applied to files:
src/net.cppsrc/net.h
📚 Learning: 2025-07-22T14:38:45.606Z
Learnt from: PastaPastaPasta
Repo: dashpay/dash PR: 6738
File: src/evo/smldiff.h:91-94
Timestamp: 2025-07-22T14:38:45.606Z
Learning: In the Dash codebase, EXCLUSIVE_LOCKS_REQUIRED annotations are Clang thread safety annotations that are enforced at compile time on supported platforms like macOS. If callers don't properly hold the required locks, the build will fail with compile-time errors, not runtime issues.
Applied to files:
src/net.cpp
📚 Learning: 2025-05-10T00:54:15.597Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6629
File: src/evo/deterministicmns.cpp:473-485
Timestamp: 2025-05-10T00:54:15.597Z
Learning: In the current implementation of `NetInfoEntry`, only `CService` is a valid type and `std::monostate` represents an invalid state. Entries that don't provide a valid `CService` through `GetAddrPort()` should be rejected with an exception, not silently skipped.
Applied to files:
src/net.cppsrc/net.h
📚 Learning: 2025-09-09T21:36:11.833Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:11.833Z
Learning: In RawSender class (src/stats/rawsender.cpp), cs_net is the appropriate mutex for protecting socket access (m_sock) and network operations, not additional custom locks.
Applied to files:
src/net.cppsrc/net.h
📚 Learning: 2025-09-09T21:36:31.504Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.504Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access requires synchronization using the cs_net mutex. The race condition exists between SendDirectly() (callable from any thread) and Connect() (called by ReconnectThread), where both read and write m_sock concurrently. The cs_net mutex properly coordinates these network operations.
Applied to files:
src/net.cppsrc/net.h
📚 Learning: 2025-09-09T21:36:58.969Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:223-232
Timestamp: 2025-09-09T21:36:58.969Z
Learning: In RawSender class (src/stats/rawsender.cpp), socket operations are properly synchronized using the cs_net mutex. The m_sock and m_server members are GUARDED_BY(cs_net), and methods like Connect(), SendDirectly(), and ReconnectThread() use appropriate locking with cs_net to prevent race conditions on socket access.
Applied to files:
src/net.cppsrc/net.h
📚 Learning: 2025-09-09T21:36:31.504Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:0-0
Timestamp: 2025-09-09T21:36:31.504Z
Learning: In RawSender class (src/stats/rawsender.cpp), m_sock access is protected by the cs_net mutex. This mutex coordinates between the TCP send path in SendDirectly() and the reconnection operations in Connect()/Reconnect() methods, ensuring proper synchronization of socket state.
Applied to files:
src/net.h
📚 Learning: 2025-09-09T13:24:23.293Z
Learnt from: kwvg
Repo: dashpay/dash PR: 6837
File: src/stats/rawsender.cpp:70-97
Timestamp: 2025-09-09T13:24:23.293Z
Learning: In RawSender class, m_sock and m_server members are accessed concurrently between SendDirectly() (which can be called from any thread) and Connect() (called by ReconnectThread), requiring proper synchronization even though Connect() has limited call sites.
Applied to files:
src/net.h
🧬 Code graph analysis (3)
src/test/util/net.h (1)
src/net.h (2)
m_nodes_mutex(1656-1662)m_nodes_mutex(1667-1675)
src/net.cpp (1)
src/net.h (6)
AlreadyConnectedToAddress(1699-1753)m_nodes_mutex(1656-1662)m_nodes_mutex(1667-1675)ExistsNode(1689-1693)CopyStats(1032-1040)AllNodes(1299-1299)
src/net.h (1)
src/net.cpp (55)
CopyStats(595-645)CopyStats(595-595)use_v2transport(1946-1946)CheckIncomingNonce(347-355)CheckIncomingNonce(347-347)ForNode(4738-4743)ForNode(4738-4738)ForNode(4745-4750)ForNode(4745-4745)ForNode(4752-4757)ForNode(4752-4752)ForNode(4759-4764)ForNode(4759-4759)CNode(4611-4649)addr(2862-2862)addr(3314-3314)IsMasternodeOrDisconnectRequested(4766-4771)IsMasternodeOrDisconnectRequested(4766-4766)msg(782-782)GetExtraFullOutboundCount(2780-2796)GetExtraFullOutboundCount(2780-2780)GetExtraBlockRelayCount(2798-2810)GetExtraBlockRelayCount(2798-2798)AddNode(4220-4232)AddNode(4220-4220)GetAddedNodeInfo(3238-3296)GetAddedNodeInfo(3238-3238)GetNodeStats(4441-4454)GetNodeStats(4441-4441)DisconnectNode(4456-4463)DisconnectNode(4456-4456)DisconnectNode(4465-4477)DisconnectNode(4465-4465)DisconnectNode(4479-4482)DisconnectNode(4479-4479)DisconnectNode(4484-4491)DisconnectNode(4484-4484)MultipleManualOrFullOutboundConns(2825-2829)MultipleManualOrFullOutboundConns(2825-2825)ThreadMessageHandler(3607-3653)ThreadMessageHandler(3607-3607)AlreadyConnectedToAddress(341-345)AlreadyConnectedToAddress(341-341)GetEvictionCandidates(1741-1789)GetEvictionCandidates(1741-1741)AttemptToEvictConnection(1799-1811)AttemptToEvictConnection(1799-1799)ConnectNode(371-516)ConnectNode(371-371)SocketRecvData(2522-2577)SocketRecvData(2522-2522)GetCurrentBlockRelayOnlyConns(3225-3236)GetCurrentBlockRelayOnlyConns(3225-3225)MaybePickPreferredNetwork(2831-2845)MaybePickPreferredNetwork(2831-2831)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build slim container / Build container
- GitHub Check: Build container / Build container
🔇 Additional comments (10)
src/sync.cpp (1)
290-297: LGTM! Clean implementation of shared lock assertion.The new
AssertSharedLockHeldInternalfunction correctly mirrors the existing exclusive-lock assertion pattern and integrates properly with the lock-order debugging infrastructure.src/net.h (5)
735-760: Mutable mutexes enable const-correct thread-safe methods.Making these mutexes
mutableis the correct approach to allow const methods likeCopyStats(line 1032) to acquire locks for thread-safe access to internal state.
1301-1327: LGTM! Clean ForNode API with proper const-correctness.The expanded
ForNodeoverloads provide a clean, type-safe API with both const and non-const variants. The use ofREAD_LOCKinternally withEXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex)annotations correctly expresses the locking contract.
1333-1407: Iteration helpers correctly leverage shared locking.The updated iteration helpers (
ForEachNodeContinueIf,ForEachNode,ForEachNodeThen) consistently useREAD_LOCK(m_nodes_mutex)to enable concurrent safe iteration by multiple readers. The const/non-const overload pairs provide proper const-correctness.
1521-1521: Good optimization: downgrade to shared lock for read-only query.Changing from
EXCLUSIVE_LOCKS_REQUIREDtoSHARED_LOCKS_REQUIREDforMultipleManualOrFullOutboundConnsis correct since it only readsm_network_conn_counts. This allows concurrent queries without blocking.
1285-1285: No recursive locking patterns identified—change is safe.Verification confirms the RecursiveMutex → SharedMutex change at line 1285 is safe. Analysis shows:
- SharedMutex is defined as a wrapper around std::shared_mutex, which supports concurrent readers.
- All m_nodes_mutex acquisitions use READ_LOCK (shared access), which is fully compatible with std::shared_mutex.
- FindNode and FindNodeMutable are annotated with SHARED_LOCKS_REQUIRED and enforce the shared lock via AssertSharedLockHeld.
- ForNode/ForEachNode implementations acquire READ_LOCK once and invoke callbacks within that scope—no nested acquisitions.
- Preconditions (EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_mutex)) prevent callers from holding the lock before entry, eliminating re-entrance risk.
- No code paths attempt to re-acquire m_nodes_mutex recursively.
src/test/util/net.h (1)
44-44: LGTM! Appropriate use of shared lock for read-only access.Changing from
LOCK(exclusive) toREAD_LOCK(shared) is correct sinceTestNodes()only readsm_nodesto return a copy. This allows concurrent test access without blocking.src/net.cpp (3)
341-345: Excellent const-correctness and shared-lock refactoring.The conversion to shared locks and const-correct APIs is well-executed:
CNode::CopyStatsmade const (line 595) correctly reflects its read-only natureGetEvictionCandidates(lines 1741-1789) usesREAD_LOCKallowing concurrent candidate evaluation- New const overloads for
ForNode(lines 4745-4750, 4759-4764) enable const-qualified traversalAlreadyConnectedToAddress(lines 341-345) and similar methods properly useREAD_LOCKfor lookupsThese changes align with the PR's goal of reducing lock contention (reported 86.5% reduction on m_nodes_mutex) while maintaining thread safety.
Also applies to: 595-595, 1741-1789, 4745-4750, 4759-4764
3371-3415: Shared-lock node access relies on CNode field thread-safety.The
getPendingQuorumNodeslambda (lines 3371-3415) accesses and modifies CNode fields while holding only a shared lock onm_nodes_mutex:
- Line 3386: Reads
pnode->m_masternode_connection- Line 3393: Reads
pnode->nTimeFirstMessageReceived.load()(atomic)- Line 3400: Writes
pnode->fDisconnect = trueThis is consistent with the PR's design where the shared mutex protects the container structure and pointer lifetimes, while individual CNode fields are thread-safe (atomics or protected by their own mutexes). The use of
.load()at line 3393 confirms atomics are used as stated in the PR description.
2540-2540: Remove unnecessary READ_LOCK(m_nodes_mutex) guards before CloseSocketDisconnect calls.Lock ordering deadlocks occur when threads acquire locks in inconsistent order. However, the evidence shows these
READ_LOCKcalls are unnecessary:
- Line 4117-4120 proves the pattern works without
m_nodes_mutex:WITH_LOCK(m_nodes_mutex, nodes.swap(m_nodes));followed bypnode->CloseSocketDisconnect(this);— the lock is released before callingCloseSocketDisconnect- CloseSocketDisconnect (line 521) only acquires
cs_mapSocketToNodeandm_sock_mutex, neverm_nodes_mutex- No code path was found acquiring
cs_mapSocketToNodewhile holdingm_nodes_mutexin a way that would create circular wait with these callsitesThe developer's comment
// is this here for lock ordering?at all three locations (2540, 2555, 2568) indicates uncertainty. Based on the counter-example at line 4117-4120, theseREAD_LOCKguards should be removed.
| template<typename Key> | ||
| const CNode* FindNode(const Key& key, bool fExcludeDisconnecting = true) const SHARED_LOCKS_REQUIRED(m_nodes_mutex) | ||
| { | ||
| AssertSharedLockHeld(m_nodes_mutex); | ||
| for (const CNode* pnode : m_nodes) { | ||
| if (fExcludeDisconnecting && pnode->fDisconnect) continue; | ||
| if (NodeMatches(pnode, key)) return pnode; | ||
| } | ||
| return nullptr; | ||
| } | ||
|
|
||
| template<typename Key> | ||
| CNode* FindNodeMutable(const Key& key, bool fExcludeDisconnecting = true) SHARED_LOCKS_REQUIRED(m_nodes_mutex) | ||
| { | ||
| AssertSharedLockHeld(m_nodes_mutex); | ||
| for (CNode* pnode : m_nodes) { | ||
| if (fExcludeDisconnecting && pnode->fDisconnect) continue; | ||
| if (NodeMatches(pnode, key)) return pnode; | ||
| } | ||
| return nullptr; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify fExcludeDisconnecting default doesn't break DisconnectNode RPC.
The default fExcludeDisconnecting = true in both FindNode and FindNodeMutable templates could prevent DisconnectNode operations from finding nodes that are in the process of disconnecting. This was flagged in past reviews as making the RPC regressive.
Additionally, FindNodeMutable returns a mutable CNode* while only requiring SHARED_LOCKS_REQUIRED(m_nodes_mutex). This is safe only because CNode fields are internally synchronized, but this critical invariant should be documented.
#!/bin/bash
# Check if DisconnectNode implementations explicitly pass false for fExcludeDisconnecting
echo "=== Checking DisconnectNode implementations ==="
ast-grep --pattern 'DisconnectNode($$$) {
$$$
}'
echo ""
echo "=== Checking calls to FindNode/FindNodeMutable from DisconnectNode ==="
rg -n --type=cpp -A10 'bool.*DisconnectNode' | rg '(FindNode|ExistsNode|WithNode)'
echo ""
echo "=== Checking all usages of fExcludeDisconnecting parameter ==="
rg -n --type=cpp 'fExcludeDisconnecting'🤖 Prompt for AI Agents
In src/net.h around lines 1655-1675, the default fExcludeDisconnecting=true on
both FindNode and FindNodeMutable can cause DisconnectNode RPCs to miss nodes in
the process of disconnecting; either change the default to false or (preferably)
keep the default but update all DisconnectNode implementations to explicitly
pass false when they must match disconnecting nodes. Also add a short comment
above FindNodeMutable documenting that although it returns a mutable CNode*
while only requiring SHARED_LOCKS_REQUIRED(m_nodes_mutex), this is safe because
CNode's mutable fields are internally synchronized and that the shared-lock
annotation must be preserved.
|
Guix Automation has completed; a release should be present here: https://github.com/dashpay/dash-dev-branches/releases/tag/v23.0.0-devpr6912.9b22095c. The image should be on dockerhub soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 9b22095
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 9b22095
Issue being fixed or feature implemented
m_nodes_mutex (mostly) just guards m_nodes; this is ~90% of the time only read from. I only see us mutating this data structure when we add or remove connections / peers. All fields inside of the CNode* inside m_nodes_mutex are thread safe. Typically atomics. As such, (it appears) we only really care about protecting the structure itself, and the lifetime of the pointers in it. Both these are still safe with a shared mutex model.
What was done?
Implemented usage of a shared mutex. We see lock contention drop significantly.
Contention on m_nodes_mutex went from
It is interesting that the average wait time for this mutex does increase, though I guess not super surprising. However, this is still a huge win.
How Has This Been Tested?
Tested on testnet 1 node.
Breaking Changes
None
Checklist:
Go over all the following points, and put an
xin all the boxes that apply.