-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delay broadcasting Channel Updates until connected to peers #2731
Conversation
I took the following approach to tackling this issue:
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #2731 +/- ##
==========================================
+ Coverage 89.14% 91.68% +2.53%
==========================================
Files 116 118 +2
Lines 93205 111593 +18388
Branches 93205 111593 +18388
==========================================
+ Hits 83089 102315 +19226
+ Misses 7583 7246 -337
+ Partials 2533 2032 -501 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use a test. Also, I do think we should consider the close-then-shutdown case - how do we get these out if we were shutting down when we closed or if we close on restart but dont keep the node online for long?
Certainly! I am on it 🧑💻
I have some thoughts on this scenario you brought up that I would love to share. In this case, it seems like broadcasting might be a challenge since, during the shutdown, we wouldn't be connected to anyone to relay the message. As far as I know, our node doesn't automatically broadcast the channel graph each time it restarts. To tackle this, it might be worth considering the option of persisting the data to be broadcast later when the node comes back online. However, I'm curious about the importance of the channel update message and whether it's crucial enough to justify persisting the data across multiple node sessions. I'd love to hear your perspective on this matter. |
Updated from pr2731.01 -> pr2731.02 (diff) Changes:
|
Updated from pr2731.02 to pr2731.03 (diff) with the following changes:
These adjustments enhance the reliability of broadcasting pending channel_update messages in situations involving close-then-shutdown, providing a more robust system. |
Updated from pr2731.03 to pr2731.04 (diff): Changes:
|
Updated from pr2731.05 to pr2731.06 (diff): Updates:
|
Updated from pr2731.06 to pr2731.07 (diff):
|
Note Reviews PausedUse the following commands to manage reviews:
WalkthroughThe update improves the reliability of Changes
Assessment against linked issues
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (1)
- lightning/src/ln/channelmanager.rs (8 hunks)
Additional comments: 5
lightning/src/ln/channelmanager.rs (5)
- 1384-1386: The addition of
pending_broadcast_messages
is consistent with the PR's objective to cache unsentchannel_update
messages. Ensure that the Mutex is used correctly throughout the code to prevent data races.- 2462-2462: Initialization of
pending_broadcast_messages
with an empty vector is correct and follows Rust's conventions for initializing state within a struct.- 8207-8208: Appending
pending_broadcast_messages
topending_events
is a key part of the mechanism to ensure cached messages are sent. Ensure that this operation is thread-safe and that the lock is held for the minimum time necessary.- 11105-11105: The initialization of
pending_broadcast_messages
is repeated here, which is consistent with the earlier initialization. This is standard practice and is approved.- 11636-11662: The test case
test_channel_update_cached
correctly verifies the caching behavior of channel updates. It checks that the message is not immediately sent but is cached correctly, which aligns with the PR's objectives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- lightning/src/ln/channelmanager.rs (7 hunks)
Additional comments: 7
lightning/src/ln/channelmanager.rs (7)
- 1384-1387: The addition of
pending_broadcast_messages
is consistent with the PR's objective to cache unsentchannel_update
messages. Ensure that theMutex
is used correctly throughout the code to prevent data races.- 2462-2462: Initialization of
pending_broadcast_messages
with an empty vector is correct and matches the declaration in hunk 1.- 2954-2959: The logic to cache the
channel_update
message when no peers are connected is implemented as described. However, ensure that thelog_trace!
macro is used consistently across the codebase for logging.- 8177-8177: This comment block is a placeholder for the
get_and_clear_pending_msg_events
function. It's important to ensure that the documentation is updated to reflect any new logic related to the broadcasting of messages.- 8206-8207: Appending
pending_broadcast_messages
topending_events
is in line with the intended functionality. However, consider the comment from wpaulino about ensuring that there are connected peers before attempting to broadcast.- 11104-11104: The initialization of
pending_broadcast_messages
in a different constructor is consistent with the previous hunk and is necessary for proper functionality.- 11635-11661: The test
test_channel_update_cached
verifies the caching behavior ofchannel_update
messages. Ensure that the test covers all relevant scenarios, including node restarts and reconnections.
Updated from pr2731.08 to pr2731.09 (diff): Changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 4
Configuration used: CodeRabbit UI
Files selected for processing (2)
- lightning/src/ln/channelmanager.rs (10 hunks)
- lightning/src/ln/reorg_tests.rs (2 hunks)
Additional comments: 7
lightning/src/ln/channelmanager.rs (7)
- 1384-1386: The addition of
pending_broadcast_messages
to cache channel update messages not broadcasted due to lack of peer connections is a good approach to ensure reliability in message broadcasting. Ensure that the mutex is consistently unlocked to prevent deadlocks.- 2954-2958: Caching the channel update message when an update is available but not immediately broadcastable is implemented correctly. However, ensure that the locking mechanism on
pending_broadcast_messages
does not introduce any performance bottlenecks or deadlocks, especially in high-concurrency scenarios.- 8176-8176: The documentation update clarifying the placement of
BroadcastChannelAnnouncement
andBroadcastChannelUpdate
amongMessageSendEvent
s is helpful for understanding the intended behavior. It's important that documentation keeps pace with code changes to aid future maintainability.- 8202-8216: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [8196-8213]
The logic to check for connected peers before appending broadcast messages to the pending events list is sound. However, consider optimizing the iteration over
per_peer_state
to avoid potential performance issues in scenarios with a large number of peers.
- 11642-11689: The test
test_channel_update_cached
effectively verifies the caching and broadcasting behavior of channel update messages under various network conditions. Ensure that edge cases, such as rapid connect/disconnect scenarios, are also covered to prevent any unforeseen issues.- 11714-11722: The test
test_drop_disconnected_peers_when_removing_channels
correctly asserts the behavior of peer state management upon disconnection and force closure of channels. It's crucial to also test the behavior when peers reconnect after being dropped to ensure the system's resilience.- 12426-12430: The test
test_trigger_lnd_force_close
sets up a scenario to test force-closure of channels, which is essential for ensuring the robustness of channel management under adversarial conditions. Consider adding assertions to verify the state of the channel and the broadcast of channel update messages post-force-close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this can be squashed now
Updated from pr2731.13 to pr2731.14 (diff): Changes:
|
Updated from pr2731.14 to pr2731.15 (diff): Updates:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, LGTM with the feedback below addressed. Feel free to squash commits down into a single clean history when you next push.
lightning/src/ln/channelmanager.rs
Outdated
@@ -1988,7 +1992,7 @@ macro_rules! handle_error { | |||
|
|||
$self.finish_close_channel(shutdown_res); | |||
if let Some(update) = update_option { | |||
msg_events.push(events::MessageSendEvent::BroadcastChannelUpdate { | |||
broadcast_event = Some(events::MessageSendEvent::BroadcastChannelUpdate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do the pending_broadcast_messages
lock and push inline here, no need to store it in a temp.
lightning/src/ln/channelmanager.rs
Outdated
@@ -4059,7 +4064,8 @@ where | |||
} | |||
if let ChannelPhase::Funded(channel) = channel_phase { | |||
if let Ok(msg) = self.get_channel_update_for_broadcast(channel) { | |||
peer_state.pending_msg_events.push(events::MessageSendEvent::BroadcastChannelUpdate { msg }); | |||
let pending_broadcast_messages = &mut self.pending_broadcast_messages.lock().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: everywhere you take the pending_broadcast_messages
lock you don't need the &mut
part.
} | ||
|
||
pub fn disconnect_dummy_node<'a, 'b: 'a, 'c: 'b>(node: &Node<'a, 'b, 'c>) { | ||
node.node.peer_disconnected(&PublicKey::from_slice(&[2; 33]).unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be symmetric with connect_dummy_node
, either we don't peer_connected
on the onion_messenger
or we should peer_disconnected
as well.
// Commenting the assignment to remove `unused_assignments` warning. | ||
// dummy_connected = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we need to keep this here.
lightning/src/ln/reorg_tests.rs
Outdated
@@ -763,21 +763,21 @@ fn test_htlc_preimage_claim_prev_counterparty_commitment_after_current_counterpa | |||
fn do_test_retries_own_commitment_broadcast_after_reorg(anchors: bool, revoked_counterparty_commitment: bool) { | |||
// Tests that a node will retry broadcasting its own commitment after seeing a confirmed | |||
// counterparty commitment be reorged out. | |||
let mut chanmon_cfgs = create_chanmon_cfgs(2); | |||
let mut chanmon_cfgs = create_chanmon_cfgs(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems all the changes in this file can be reverted.
#[test] | ||
fn test_drop_disconnected_peers_when_removing_channels() { | ||
let chanmon_cfgs = create_chanmon_cfgs(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems all the test changes in this file from here down can be reverted.
Updated from pr2731.16 to pr2731.17 (diff): Updates:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One small nit otherwise just needs another reviewer.
Updated from pr2731.17 to pr2731.18 (diff): Update:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though let's perhaps have @TheBlueMatt take another look because he suggested the opposite change in tests from the one I suggested.
And thank you very much for the renaming!
I don't feel super strongly about the test changes, but we may have to drop some of the assertions when we eventually move the test out of |
- We might generate channel updates to be broadcast when we are not connected to any peers to broadcast them to. - This PR ensures to cache them and broadcast them only when we are connected to some peers. Other Changes: 1. Introduce a test. 2. Update the relevant current tests affected by this change. 3. Fix a typo. 4. Introduce two functions in functional_utils that optionally connect and disconnect a dummy node during broadcast testing.
Updated from pr2731.19 to pr2731.20 (diff): Update:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
v0.0.123 - May 08, 2024 - "BOLT12 Dust Sweeping" API Updates =========== * To reduce risk of force-closures and improve HTLC reliability the default dust exposure limit has been increased to `MaxDustHTLCExposure::FeeRateMultiplier(10_000)`. Users with existing channels might want to consider using `ChannelManager::update_channel_config` to apply the new default (lightningdevkit#3045). * `ChainMonitor::archive_fully_resolved_channel_monitors` is now provided to remove from memory `ChannelMonitor`s that have been fully resolved on-chain and are now not needed. It uses the new `Persist::archive_persisted_channel` to inform the storage layer that such a monitor should be archived (lightningdevkit#2964). * An `OutputSweeper` is now provided which will automatically sweep `SpendableOutputDescriptor`s, retrying until the sweep confirms (lightningdevkit#2825). * After initiating an outbound channel, a peer disconnection no longer results in immediate channel closure. Rather, if the peer is reconnected before the channel times out LDK will automatically retry opening it (lightningdevkit#2725). * `PaymentPurpose` now has separate variants for BOLT12 payments, which include fields from the `invoice_request` as well as the `OfferId` (lightningdevkit#2970). * `ChannelDetails` now includes a list of in-flight HTLCs (lightningdevkit#2442). * `Event::PaymentForwarded` now includes `skimmed_fee_msat` (lightningdevkit#2858). * The `hashbrown` dependency has been upgraded and the use of `ahash` as the no-std hash table hash function has been removed. As a consequence, LDK's `Hash{Map,Set}`s no longer feature several constructors when LDK is built with no-std; see the `util::hash_tables` module instead. On platforms that `getrandom` supports, setting the `possiblyrandom/getrandom` feature flag will ensure hash tables are resistant to HashDoS attacks, though the `possiblyrandom` crate should detect most common platforms (lightningdevkit#2810, lightningdevkit#2891). * `ChannelMonitor`-originated requests to the `ChannelSigner` can now fail and be retried using `ChannelMonitor::signer_unblocked` (lightningdevkit#2816). * `SpendableOutputDescriptor::to_psbt_input` now includes the `witness_script` where available as well as new proprietary data which can be used to re-derive some spending keys from the base key (lightningdevkit#2761, lightningdevkit#3004). * `OutPoint::to_channel_id` has been removed in favor of `ChannelId::v1_from_funding_outpoint` in preparation for v2 channels with a different `ChannelId` derivation scheme (lightningdevkit#2797). * `PeerManager::get_peer_node_ids` has been replaced with `list_peers` and `peer_by_node_id`, which provide more details (lightningdevkit#2905). * `Bolt11Invoice::get_payee_pub_key` is now provided (lightningdevkit#2909). * `Default[Message]Router` now take an `entropy_source` argument (lightningdevkit#2847). * `ClosureReason::HTLCsTimedOut` has been separated out from `ClosureReason::HolderForceClosed` as it is the most common case (lightningdevkit#2887). * `ClosureReason::CooperativeClosure` is now split into `{Counterparty,Locally}Initiated` variants (lightningdevkit#2863). * `Event::ChannelPending::channel_type` is now provided (lightningdevkit#2872). * `PaymentForwarded::{prev,next}_user_channel_id` are now provided (lightningdevkit#2924). * Channel init messages have been refactored towards V2 channels (lightningdevkit#2871). * `BumpTransactionEvent` now contains the channel and counterparty (lightningdevkit#2873). * `util::scid_utils` is now public, with some trivial utilities to examine short channel ids (lightningdevkit#2694). * `DirectedChannelInfo::{source,target}` are now public (lightningdevkit#2870). * Bounds in `lightning-background-processor` were simplified by using `AChannelManager` (lightningdevkit#2963). * The `Persist` impl for `KVStore` no longer requires `Sized`, allowing for the use of `dyn KVStore` as `Persist` (lightningdevkit#2883, lightningdevkit#2976). * `From<PaymentPreimage>` is now implemented for `PaymentHash` (lightningdevkit#2918). * `NodeId::from_slice` is now provided (lightningdevkit#2942). * `ChannelManager` deserialization may now fail with `DangerousValue` when LDK's persistence API was violated (lightningdevkit#2974). Bug Fixes ========= * Excess fees on counterparty commitment transactions are now included in the dust exposure calculation. This lines behavior up with some cases where transaction fees can be burnt, making them effectively dust exposure (lightningdevkit#3045). * `Future`s used as an `std::...::Future` could grow in size unbounded if it was never woken. For those not using async persistence and using the async `lightning-background-processor`, this could cause a memory leak in the `ChainMonitor` (lightningdevkit#2894). * Inbound channel requests that fail in `ChannelManager::accept_inbound_channel` would previously have stalled from the peer's perspective as no `error` message was sent (lightningdevkit#2953). * Blinded path construction has been tuned to select paths more likely to succeed, improving BOLT12 payment reliability (lightningdevkit#2911, lightningdevkit#2912). * After a reorg, `lightning-transaction-sync` could have failed to follow a transaction that LDK needed information about (lightningdevkit#2946). * `RecipientOnionFields`' `custom_tlvs` are now propagated to recipients when paying with blinded paths (lightningdevkit#2975). * `Event::ChannelClosed` is now properly generated and peers are properly notified for all channels that as a part of a batch channel open fail to be funded (lightningdevkit#3029). * In cases where user event processing is substantially delayed such that we complete multiple round-trips with our peers before a `PaymentSent` event is handled and then restart without persisting the `ChannelManager` after having persisted a `ChannelMonitor[Update]`, on startup we may have `Err`d trying to deserialize the `ChannelManager` (lightningdevkit#3021). * If a peer has relatively high latency, `PeerManager` may have failed to establish a connection (lightningdevkit#2993). * `ChannelUpdate` messages broadcasted for our own channel closures are now slightly more robust (lightningdevkit#2731). * Deserializing malformed BOLT11 invoices may have resulted in an integer overflow panic in debug builds (lightningdevkit#3032). * In exceedingly rare cases (no cases of this are known), LDK may have created an invalid serialization for a `ChannelManager` (lightningdevkit#2998). * Message processing latency handling BOLT12 payments has been reduced (lightningdevkit#2881). * Latency in processing `Event::SpendableOutputs` may be reduced (lightningdevkit#3033). Node Compatibility ================== * LDK's blinded paths were inconsistent with other implementations in several ways, which have been addressed (lightningdevkit#2856, lightningdevkit#2936, lightningdevkit#2945). * LDK's messaging blinded paths now support the latest features which some nodes may begin relying on soon (lightningdevkit#2961). * LDK's BOLT12 structs have been updated to support some last-minute changes to the spec (lightningdevkit#3017, lightningdevkit#3018). * CLN v24.02 requires the `gossip_queries` feature for all peers, however LDK by default does not set it for those not using a `P2PGossipSync` (e.g. those using RGS). This change was reverted in CLN v24.02.2 however for now LDK always sets the `gossip_queries` feature. This change is expected to be reverted in a future LDK release (lightningdevkit#2959). Security ======== 0.0.123 fixes a denial-of-service vulnerability which we believe to be reachable from untrusted input when parsing invalid BOLT11 invoices containing non-ASCII characters. * BOLT11 invoices with non-ASCII characters in the human-readable-part may cause an out-of-bounds read attempt leading to a panic (lightningdevkit#3054). Note that all BOLT11 invoices containing non-ASCII characters are invalid. In total, this release features 150 files changed, 19307 insertions, 6306 deletions in 360 commits since 0.0.121 from 17 authors, in alphabetical order: * Arik Sosman * Duncan Dean * Elias Rohrer * Evan Feenstra * Jeffrey Czyz * Keyue Bao * Matt Corallo * Orbital * Sergi Delgado Segura * Valentine Wallace * Willem Van Lint * Wilmer Paulino * benthecarman * jbesraa * olegkubrakov * optout * shaavan
resolves #2711
We might generate channel updates to be broadcasted when we are not connected to any peers to broadcast them to. This PR ensures to cache them and broadcast them only when we are connected to some peers.