-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor commitment broadcast to always go through OnchainTxHandler #2703
Refactor commitment broadcast to always go through OnchainTxHandler #2703
Conversation
7093f84
to
13b4f49
Compare
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #2703 +/- ##
==========================================
- Coverage 88.66% 88.57% -0.10%
==========================================
Files 115 115
Lines 91168 91399 +231
Branches 91168 91399 +231
==========================================
+ Hits 80838 80955 +117
- Misses 7908 7977 +69
- Partials 2422 2467 +45 ☔ View full report in Codecov by Sentry. |
13b4f49
to
d9422ca
Compare
// Now that we've detected a confirmed commitment transaction, attempt to cancel | ||
// pending claims for any commitments that were previously confirmed such that | ||
// we don't continue claiming inputs that no longer exist. | ||
self.cancel_prev_commitment_claims(&logger, &txid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm honestly pretty skeptical of our test coverage of re-creating claims after a reorg, which makes me pretty skeptical of this change. If we want to delete pending claims, can we instead do it after ANTI_REORG_DELAY? I'm not quite sure I understand the motivation for this commit anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm honestly pretty skeptical of our test coverage of re-creating claims after a reorg, which makes me pretty skeptical of this change.
We have several tests covering possible reorg scenarios, are you implying we have cases uncovered?
If we want to delete pending claims, can we instead do it after ANTI_REORG_DELAY?
The claims never confirm because their inputs are now reorged out so ANTI_REORG_DELAY
doesn't help.
I'm not quite sure I understand the motivation for this commit anyway.
It's mostly a nice-to-have change -- it simplifies certain test assertions and prevents us from continuously trying to claim inputs that will never succeed as they no longer exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have several tests covering possible reorg scenarios, are you implying we have cases uncovered?
Where? I glanced in reorg_tests and didn't see any that were checking that if we reorg out a commitment tx we broadcast our own (replacement) commitment tx immediately afterwards.
The claims never confirm because their inputs are now reorged out so ANTI_REORG_DELAY doesn't help.
Right, I mean if we see a conflicting commitment tx we remove the conflicts here, but we could also do this after 6 confs on the conflicting commitment tx.
It's mostly a nice-to-have change -- it simplifies certain test assertions and prevents us from continuously trying to claim inputs that will never succeed as they no longer exist.
Hmm, looks like currently only one test fails? I assume this is mostly in reference to a future patchset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where? I glanced in reorg_tests and didn't see any that were checking that if we reorg out a commitment tx we broadcast our own (replacement) commitment tx immediately afterwards.
That specific case we don't have coverage, but it all depends on whether we needed to broadcast before the reorg. I wrote a quick test locally and it checks out, so I can push that.
Right, I mean if we see a conflicting commitment tx we remove the conflicts here, but we could also do this after 6 confs on the conflicting commitment tx.
Why wait that long though? We know the previous claims are invalid as soon as the conflict confirms. Note that this is just about removing the claims that come after the commitment, not the commitment itself. We will continue to retry the commitment until one reaches ANTI_REORG_DELAY
.
Hmm, looks like currently only one test fails? I assume this is mostly in reference to a future patchset.
It's not so much about the number of tests failing, but rather simplifying assertions throughout the failing test. There is a future patch to follow, but it doesn't really concern reorgs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wait that long though? We know the previous claims are invalid as soon as the conflict confirms. Note that this is just about removing the claims that come after the commitment, not the commitment itself. We will continue to retry the commitment until one reaches ANTI_REORG_DELAY.
Mostly because there's no new test in the first commit, and I know we have some level of missing test coverage here, and I'm not sure we can enumerate all the cases very easily so I'm just trying to be pretty cautious. Doubly so since we dont hit many reorg cases in prod so we won't discover these bugs unless its in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see the risk here. As long as we can guarantee we'll broadcast our own commitment after reorg (new test shows this), there's no chance we'll miss claiming anything from it, as once it confirms, the monitor will pick up the outputs to claim per usual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my concern is that we somehow forget to re-add claims for our own transactions, but you're right, your test should be pretty good for that. Can you make the test into a matrix, though, with anchors and use of B broadcasting a revoked transaction rather than a normal one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done.
.or_else(|| { | ||
self.pending_claim_requests.iter() | ||
.find(|(_, claim)| claim.outpoints().iter().any(|claim_outpoint| *claim_outpoint == outpoint)) | ||
.map(|(claim_id, _)| *claim_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be unreachable, right? It looks like no tests hit it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be yes, but I included here just to be safe in the event we are tracking a pending request in pending_claim_requests
that we have yet to generate a claim for.
// Now that we've detected a confirmed commitment transaction, attempt to cancel | ||
// pending claims for any commitments that were previously confirmed such that | ||
// we don't continue claiming inputs that no longer exist. | ||
self.cancel_prev_commitment_claims(&logger, &txid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my concern is that we somehow forget to re-add claims for our own transactions, but you're right, your test should be pretty good for that. Can you make the test into a matrix, though, with anchors and use of B broadcasting a revoked transaction rather than a normal one?
Needs a rebase. |
Once a commitment transaction is broadcast/confirms, we may need to claim some of the HTLCs in it. These claims are sent as requests to the `OnchainTxHandler`, which will bump their feerate as they remain unconfirmed. When said commitment transaction becomes unconfirmed though, and another commitment confirms instead, i.e., a reorg happens, the `OnchainTxHandler` doesn't have any insight into whether these claims are still valid or not, so it continues attempting to claim the HTLCs from the previous commitment (now unconfirmed) forever, along with the HTLCs from the newly confirmed commitment.
Currently, our holder commitment broadcast only goes through the `OnchainTxHandler` for anchor outputs channels because we can actually bump the commitment transaction fees with it. For non-anchor outputs channels, we would just broadcast once directly via the `ChannelForceClosed` monitor update, without going through the `OnchainTxHandler`. As we add support for async signing, we need to be tolerable to signing failures. A signing failure of our holder commitment will currently panic, but once the panic is removed, we must be able to retry signing once the signer is available. We can easily achieve this via the existing `OnchainTxHandler::rebroadcast_pending_claims`, but this requires that we first queue our holder commitment as a claim. This commit ensures we do so everywhere we need to broadcast a holder commitment transaction, regardless of the channel type. Co-authored-by: Rachel Malonson <[email protected]>
569fd4a
to
60bb39a
Compare
Currently, our holder commitment broadcast only goes through the
OnchainTxHandler
for anchor outputs channels because we can actually bump the commitment transaction fees with it. For non-anchor outputs channels, we would just broadcast once directly via theChannelForceClosed
monitor update, without going through theOnchainTxHandler
.As we add support for async signing, we need to be tolerable to signing failures. A signing failure of our holder commitment will currently panic, but once the panic is removed, we must be able to retry signing once the signer is available. We can easily achieve this via the existing
OnchainTxHandler::rebroadcast_pending_claims
, but this requires that we first queue our holder commitment as a claim. This commit ensures we do so everywhere we need to broadcast a holder commitment transaction, regardless of the channel type.This addresses the prerequisites to #2520 as noted in #2520 (comment).