-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix lightning-net-tokio sometimes dropping messages #2832
Fix lightning-net-tokio sometimes dropping messages #2832
Conversation
WalkthroughThe update to the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err(ref e) if e.kind() == std::io::ErrorKind::WouldBlock => { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handling of std::io::ErrorKind::WouldBlock
in the send_data
method of SocketDescriptor
uses a continue
statement in a loop without any delay or backoff mechanism. This could lead to a busy-wait loop that consumes CPU resources unnecessarily when the socket is not ready to send more data.
To address this, consider implementing a backoff strategy or using an async waiting mechanism to allow other tasks to run while waiting for the socket to become writable again.
Err(ref e) if e.kind() == std::io::ErrorKind::WouldBlock => {
// Implement a delay or async wait before continuing
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we busy loop here its a bug in tokio, ultimately, as it should eventually return Poll::Pending
for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheBlueMatt Understood, if tokio's poll_write_ready
behaves as expected, it should indeed prevent a busy loop by returning Poll::Pending
when appropriate. Thank you for the clarification.
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #2832 +/- ##
=======================================
Coverage 88.52% 88.53%
=======================================
Files 114 114
Lines 92090 92092 +2
Branches 92090 92092 +2
=======================================
+ Hits 81526 81534 +8
Misses 8058 8058
+ Partials 2506 2500 -6 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, makes sense. We ultimately have to keep trying until tokio gives us a Poll::Pending
otherwise we'll hang, like you're seeing. Strange that no one else has ever run into this (that we know of). Certainly I've sent a ton of HTLC messages back to back to peers so you'd think I'd have hit it. Wonder if you have a small socket buffer size or something.
This patch is pretty safe, so just gonna land it. Ultimately we have to keep going here - the Err
return here is for "real" errors that imply the socket is closed, as we return early and assume the read task will quit and close up the socket. If we hit a WouldBlock
we're supposed to (according to the docs) treat it as "try your poll/await again", which is what this patch is doing.
Thanks for tracking this down! |
I'm basically sending a very large message (containing a lot of adaptor signatures) in chunks so that it fit within the noise limit. So it means all the messages sent are "full", maybe that's one difference with what you experienced with. |
I was having some issues sending custom messages with
v0.0.118
(when trying to send a rather large number - ~25 - it would stop sending after ~14). After debugging I found that the issue was coming fromlightning-net-tokio
and checking the documentation it seems that when receiving aWouldBlock
the write should just be retried. I confirmed that this fixed the issue I was having.