Refactor commitment broadcast to always go through OnchainTxHandler #2703

wpaulino · 2023-11-03T21:05:09Z

Currently, our holder commitment broadcast only goes through the OnchainTxHandler for anchor outputs channels because we can actually bump the commitment transaction fees with it. For non-anchor outputs channels, we would just broadcast once directly via the ChannelForceClosed monitor update, without going through the OnchainTxHandler.

As we add support for async signing, we need to be tolerable to signing failures. A signing failure of our holder commitment will currently panic, but once the panic is removed, we must be able to retry signing once the signer is available. We can easily achieve this via the existing OnchainTxHandler::rebroadcast_pending_claims, but this requires that we first queue our holder commitment as a claim. This commit ensures we do so everywhere we need to broadcast a holder commitment transaction, regardless of the channel type.

This addresses the prerequisites to #2520 as noted in #2520 (comment).

codecov-commenter · 2023-11-27T20:39:44Z

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (0c67753) 88.66% compared to head (60bb39a) 88.57%.
Report is 15 commits behind head on main.

Files	Patch %	Lines
lightning/src/ln/functional_tests.rs	87.50%	3 Missing and 2 partials ⚠️
lightning/src/ln/reorg_tests.rs	94.84%	3 Missing and 2 partials ⚠️
lightning/src/chain/channelmonitor.rs	97.82%	1 Missing and 1 partial ⚠️
lightning/src/chain/onchaintx.rs	88.88%	1 Missing and 1 partial ⚠️
lightning/src/ln/monitor_tests.rs	90.00%	1 Missing and 1 partial ⚠️
lightning/src/ln/reload_tests.rs	75.00%	0 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2703      +/-   ##
==========================================
- Coverage   88.66%   88.57%   -0.10%     
==========================================
  Files         115      115              
  Lines       91168    91399     +231     
  Branches    91168    91399     +231     
==========================================
+ Hits        80838    80955     +117     
- Misses       7908     7977      +69     
- Partials     2422     2467      +45

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lightning/src/chain/channelmonitor.rs

lightning/src/chain/onchaintx.rs

lightning/src/chain/channelmonitor.rs

TheBlueMatt · 2023-12-04T22:00:37Z

lightning/src/chain/channelmonitor.rs

+					// Now that we've detected a confirmed commitment transaction, attempt to cancel
+					// pending claims for any commitments that were previously confirmed such that
+					// we don't continue claiming inputs that no longer exist.
+					self.cancel_prev_commitment_claims(&logger, &txid);


I'm honestly pretty skeptical of our test coverage of re-creating claims after a reorg, which makes me pretty skeptical of this change. If we want to delete pending claims, can we instead do it after ANTI_REORG_DELAY? I'm not quite sure I understand the motivation for this commit anyway.

I'm honestly pretty skeptical of our test coverage of re-creating claims after a reorg, which makes me pretty skeptical of this change.

We have several tests covering possible reorg scenarios, are you implying we have cases uncovered?

If we want to delete pending claims, can we instead do it after ANTI_REORG_DELAY?

The claims never confirm because their inputs are now reorged out so ANTI_REORG_DELAY doesn't help.

I'm not quite sure I understand the motivation for this commit anyway.

It's mostly a nice-to-have change -- it simplifies certain test assertions and prevents us from continuously trying to claim inputs that will never succeed as they no longer exist.

We have several tests covering possible reorg scenarios, are you implying we have cases uncovered?

Where? I glanced in reorg_tests and didn't see any that were checking that if we reorg out a commitment tx we broadcast our own (replacement) commitment tx immediately afterwards.

The claims never confirm because their inputs are now reorged out so ANTI_REORG_DELAY doesn't help.

Right, I mean if we see a conflicting commitment tx we remove the conflicts here, but we could also do this after 6 confs on the conflicting commitment tx.

It's mostly a nice-to-have change -- it simplifies certain test assertions and prevents us from continuously trying to claim inputs that will never succeed as they no longer exist.

Hmm, looks like currently only one test fails? I assume this is mostly in reference to a future patchset.

Where? I glanced in reorg_tests and didn't see any that were checking that if we reorg out a commitment tx we broadcast our own (replacement) commitment tx immediately afterwards.

That specific case we don't have coverage, but it all depends on whether we needed to broadcast before the reorg. I wrote a quick test locally and it checks out, so I can push that.

Right, I mean if we see a conflicting commitment tx we remove the conflicts here, but we could also do this after 6 confs on the conflicting commitment tx.

Why wait that long though? We know the previous claims are invalid as soon as the conflict confirms. Note that this is just about removing the claims that come after the commitment, not the commitment itself. We will continue to retry the commitment until one reaches ANTI_REORG_DELAY.

Hmm, looks like currently only one test fails? I assume this is mostly in reference to a future patchset.

It's not so much about the number of tests failing, but rather simplifying assertions throughout the failing test. There is a future patch to follow, but it doesn't really concern reorgs.

Why wait that long though? We know the previous claims are invalid as soon as the conflict confirms. Note that this is just about removing the claims that come after the commitment, not the commitment itself. We will continue to retry the commitment until one reaches ANTI_REORG_DELAY.

Mostly because there's no new test in the first commit, and I know we have some level of missing test coverage here, and I'm not sure we can enumerate all the cases very easily so I'm just trying to be pretty cautious. Doubly so since we dont hit many reorg cases in prod so we won't discover these bugs unless its in tests.

I don't really see the risk here. As long as we can guarantee we'll broadcast our own commitment after reorg (new test shows this), there's no chance we'll miss claiming anything from it, as once it confirms, the monitor will pick up the outputs to claim per usual.

I guess my concern is that we somehow forget to re-add claims for our own transactions, but you're right, your test should be pretty good for that. Can you make the test into a matrix, though, with anchors and use of B broadcasting a revoked transaction rather than a normal one?

Sure, done.

TheBlueMatt · 2023-12-06T21:24:58Z

lightning/src/chain/onchaintx.rs

+			.or_else(|| {
+				self.pending_claim_requests.iter()
+					.find(|(_, claim)| claim.outpoints().iter().any(|claim_outpoint| *claim_outpoint == outpoint))
+					.map(|(claim_id, _)| *claim_id)


This should be unreachable, right? It looks like no tests hit it.

It should be yes, but I included here just to be safe in the event we are tracking a pending request in pending_claim_requests that we have yet to generate a claim for.

TheBlueMatt · 2023-12-06T21:28:44Z

lightning/src/chain/channelmonitor.rs

+					// Now that we've detected a confirmed commitment transaction, attempt to cancel
+					// pending claims for any commitments that were previously confirmed such that
+					// we don't continue claiming inputs that no longer exist.
+					self.cancel_prev_commitment_claims(&logger, &txid);


I guess my concern is that we somehow forget to re-add claims for our own transactions, but you're right, your test should be pretty good for that. Can you make the test into a matrix, though, with anchors and use of B broadcasting a revoked transaction rather than a normal one?

TheBlueMatt · 2023-12-11T18:37:44Z

Needs a rebase.

Once a commitment transaction is broadcast/confirms, we may need to claim some of the HTLCs in it. These claims are sent as requests to the `OnchainTxHandler`, which will bump their feerate as they remain unconfirmed. When said commitment transaction becomes unconfirmed though, and another commitment confirms instead, i.e., a reorg happens, the `OnchainTxHandler` doesn't have any insight into whether these claims are still valid or not, so it continues attempting to claim the HTLCs from the previous commitment (now unconfirmed) forever, along with the HTLCs from the newly confirmed commitment.

Currently, our holder commitment broadcast only goes through the `OnchainTxHandler` for anchor outputs channels because we can actually bump the commitment transaction fees with it. For non-anchor outputs channels, we would just broadcast once directly via the `ChannelForceClosed` monitor update, without going through the `OnchainTxHandler`. As we add support for async signing, we need to be tolerable to signing failures. A signing failure of our holder commitment will currently panic, but once the panic is removed, we must be able to retry signing once the signer is available. We can easily achieve this via the existing `OnchainTxHandler::rebroadcast_pending_claims`, but this requires that we first queue our holder commitment as a claim. This commit ensures we do so everywhere we need to broadcast a holder commitment transaction, regardless of the channel type. Co-authored-by: Rachel Malonson <[email protected]>

wpaulino added this to the 0.0.119 milestone Nov 3, 2023

wpaulino added the Seeking Code Review label Nov 22, 2023

wpaulino mentioned this pull request Nov 22, 2023

Remove panics for sign_holder_commitment_and_htlcs when a signature i… #2608

Closed

wpaulino force-pushed the retryable-commitment-broadcast branch from 7093f84 to 13b4f49 Compare November 27, 2023 20:38

arik-so reviewed Nov 29, 2023

View reviewed changes

lightning/src/chain/channelmonitor.rs Show resolved Hide resolved

lightning/src/chain/channelmonitor.rs Show resolved Hide resolved

wpaulino force-pushed the retryable-commitment-broadcast branch from 13b4f49 to d9422ca Compare November 29, 2023 18:39

arik-so reviewed Dec 1, 2023

View reviewed changes

lightning/src/chain/onchaintx.rs Show resolved Hide resolved

lightning/src/chain/channelmonitor.rs Show resolved Hide resolved

TheBlueMatt reviewed Dec 4, 2023

View reviewed changes

TheBlueMatt reviewed Dec 6, 2023

View reviewed changes

wpaulino and others added 3 commits December 11, 2023 16:44

Add test coverage for holder commitment rebroadcast after reorg

60bb39a

wpaulino force-pushed the retryable-commitment-broadcast branch from 569fd4a to 60bb39a Compare December 12, 2023 00:45

arik-so approved these changes Dec 13, 2023

View reviewed changes

TheBlueMatt approved these changes Dec 13, 2023

View reviewed changes

TheBlueMatt merged commit 0dbf17b into lightningdevkit:main Dec 13, 2023

wpaulino deleted the retryable-commitment-broadcast branch December 13, 2023 05:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor commitment broadcast to always go through OnchainTxHandler #2703

Refactor commitment broadcast to always go through OnchainTxHandler #2703

Uh oh!

wpaulino commented Nov 3, 2023

Uh oh!

codecov-commenter commented Nov 27, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TheBlueMatt Dec 4, 2023

Uh oh!

wpaulino Dec 4, 2023

Uh oh!

TheBlueMatt Dec 5, 2023

Uh oh!

wpaulino Dec 5, 2023

Uh oh!

TheBlueMatt Dec 5, 2023

Uh oh!

wpaulino Dec 5, 2023

Uh oh!

TheBlueMatt Dec 6, 2023

Uh oh!

wpaulino Dec 12, 2023

Uh oh!

TheBlueMatt Dec 6, 2023

Uh oh!

wpaulino Dec 12, 2023

Uh oh!

TheBlueMatt Dec 6, 2023

Uh oh!

TheBlueMatt commented Dec 11, 2023

Uh oh!

Uh oh!

Refactor commitment broadcast to always go through OnchainTxHandler #2703

Refactor commitment broadcast to always go through OnchainTxHandler #2703

Uh oh!

Conversation

wpaulino commented Nov 3, 2023

Uh oh!

codecov-commenter commented Nov 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Dec 11, 2023

Uh oh!

Uh oh!

codecov-commenter commented Nov 27, 2023 •

edited

Loading