Support normal channel operation with async signing #2849

waterson · 2024-01-24T19:12:38Z

This is a do-over of #2653, wherein we support asynchronous signing for 'normal' channel operation. This involves allowing the following ChannelSigner methods to return an Err result, indicating that the requested value is not available:

get_per_commitment_point
release_commitment_secret
sign_counterparty_commitment

When the value does become available, channel operation can be resumed by invoking signer_unblocked.

Note that this adds the current and next per-commitment point to the state that is persisted by the channel monitor.

coderabbitai · 2024-01-24T19:13:06Z

Walkthrough

The recent updates across various files in a Rust-based codebase focus on enhancing asynchronous operations, particularly around signing processes. These changes include the addition of new constants, modifications to the SignerProvider, and updates to message handling and error management. There's also a notable shift in how per_commitment_points are handled, alongside improved logging for debugging and more robust test utilities.

Changes

Files	Summary
`fuzz/src/chanmon_consistency.rs`, `lightning/src/ln/channelmanager.rs`, `lightning/src/util/test_channel_signer.rs`	Introduced new asynchronous operation constants, updated `SignerProvider`, added debugging logs, and refined message handling logic.
`lightning/src/chain/channelmonitor.rs`, `lightning/src/chain/onchaintx.rs`	Altered how `per_commitment_point` is assigned and used, affecting on-chain transaction handling.
`lightning/src/ln/functional_test_utils.rs`, `lightning/src/ln/functional_tests.rs`	Enhanced test utilities with new functions, improved error handling, and explicit result unwrapping.
`lightning/src/sign/mod.rs`	Modified `ChannelSigner` trait to include error handling by returning `Result` types.

🐇✨
In the realms of code, where the lightning arcs,
A rabbit hopped through, leaving whimsical marks.
With each line refined, and errors unwound,
A trail of cleaner code is what it found. 🌟🌱

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit-tests for this file.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit tests for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository from git and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit tests.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

coderabbitai

Review Status

Actionable comments generated: 16

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 4bab9c8 and 9943fb1.

Files selected for processing (10)

fuzz/src/chanmon_consistency.rs (13 hunks)
lightning/src/chain/channelmonitor.rs (1 hunks)
lightning/src/chain/onchaintx.rs (2 hunks)
lightning/src/ln/async_signer_tests.rs (10 hunks)
lightning/src/ln/channel.rs (48 hunks)
lightning/src/ln/channelmanager.rs (12 hunks)
lightning/src/ln/functional_test_utils.rs (9 hunks)
lightning/src/ln/functional_tests.rs (4 hunks)
lightning/src/sign/mod.rs (3 hunks)
lightning/src/util/test_channel_signer.rs (10 hunks)

Files not summarized due to errors (2)

lightning/src/ln/async_signer_tests.rs: Error: Message exceeds token limit
lightning/src/ln/channel.rs: Error: Message exceeds token limit

Additional comments: 90

lightning/src/util/test_channel_signer.rs (5)

73-90: The ops module defines constants for various channel signer operations. These constants are used as a bitmask to indicate the availability of operations.

166-167: The is_signer_available method checks if a signer operation is available by consulting the unavailable_signer_ops bitmask.

172-175: The get_per_commitment_point method has been updated to return a Result type, allowing for error handling when the operation is unavailable.

224-230: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [227-240]

The sign_counterparty_commitment method has been updated to check for operation availability and enforce policy checks. The error handling is consistent with the new asynchronous signing approach.

430-448: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [419-447]

The EnforcementState struct has been updated to include a unavailable_signer_ops bitmask field, which is used to track the availability of signer operations.

lightning/src/ln/async_signer_tests.rs (12)

24-50: The helper function with_async_signer is designed to simulate an asynchronous signer by disabling the signer, running a provided function, and then re-enabling the signer. This function is critical for testing the asynchronous behavior of the signer.

90-127: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [53-120]

The test functions do_test_funding_created and test_funding_created_* simulate the funding creation process with various orders of asynchronous signing operations. These tests are important to ensure that the channel state machine behaves correctly when the signer is not immediately available.

162-199: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [124-192]

The test functions do_test_funding_signed and test_funding_signed_* simulate the funding signing process with various orders of asynchronous signing operations. These tests are important to ensure that the channel state machine behaves correctly when the signer is not immediately available.

219-268: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [195-261]

The test functions do_test_commitment_signed and test_commitment_signed_* simulate the commitment signing process with various orders of asynchronous signing operations. These tests are important to ensure that the channel state machine behaves correctly when the signer is not immediately available.

350-1132: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [265-380]

The test functions do_test_funding_signed_0conf and test_funding_signed_0conf_* simulate the funding signing process for zero-confirmation channels with various orders of asynchronous signing operations. These tests are important to ensure that the channel state machine behaves correctly when the signer is not immediately available, especially in the context of zero-conf channels which have different security assumptions.

384-572: The test functions do_test_payment and test_payment_* simulate the payment process with various orders of asynchronous signing operations. These tests are important to ensure that the channel state machine behaves correctly when the signer is not immediately available during the payment process.

575-744: The test functions do_test_peer_reconnect and test_peer_reconnect_* simulate the peer reconnection process with various orders of asynchronous signing operations. These tests are important to ensure that the channel state machine behaves correctly when the signer is not immediately available during peer reconnection, which is a critical part of maintaining channel state across network disruptions.

747-809: The test function channel_update_fee_test simulates a scenario where a fee update is required while the signer is unavailable. It is important to ensure that the channel can handle fee updates even when the signer is not immediately available, as fee updates are a critical part of channel management to ensure timely transaction confirmation.

812-850: The test function monitor_honors_commitment_raa_order ensures that the channel monitor honors the order of commitment and revoke-and-ack messages even when the signer is unavailable. This is crucial for maintaining the correct channel state and ensuring that the security properties of the Lightning protocol are upheld.

853-949: The test function peer_restart_with_blocked_signer_and_pending_payment simulates a scenario where a peer restarts with a blocked signer and a pending payment. This test is important to ensure that the channel state machine can handle restarts and continue processing payments even when the signer is not immediately available.

952-1065: The test function peer_restart_with_blocked_signer_before_pending_payment simulates a scenario where a peer restarts with a blocked signer before a pending payment. This test is important to ensure that the channel state machine can handle restarts and continue processing payments even when the signer is not immediately available.

1068-1131: The test function no_stray_channel_reestablish ensures that no unexpected channel_reestablish messages are sent when the signer is blocked and the peers are reconnected. This is important for maintaining protocol correctness and ensuring that no unnecessary messages are sent that could confuse the channel state machine.

lightning/src/chain/onchaintx.rs (2)

181-181: The addition of per_commitment_point to ExternalHTLCClaim struct is consistent with the PR objectives and AI-generated summary. This field is necessary for handling commitment points in the context of on-chain transactions.

1181-1185: The usage of per_commitment_point within the generate_external_htlc_claim function aligns with the new field addition in ExternalHTLCClaim and is correctly assigned from the trusted_tx.per_commitment_point(). This ensures that the commitment point is available for the external HTLC claim process.

fuzz/src/chanmon_consistency.rs (11)

52-52: The import statement has been updated to include ops, which aligns with the PR's goal to enhance asynchronous signing capabilities.

76-76: The addition of the ASYNC_OPS constant is appropriate for the asynchronous operations introduced in this PR.

72-79: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [76-91]

The FuzzEstimator logic remains unchanged and appears to be correctly implementing fee estimation based on the target confirmation.

72-79: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [76-91]

The KeyProvider struct and its trait implementations are crucial for the signing process. The changes should support the new asynchronous signing features without introducing issues.

72-79: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [76-91]

The check_api_err and check_payment_err functions are used for error handling. It's important to ensure that they cover all necessary cases and that the error handling is robust.

72-79: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [76-91]

The do_test function is the main entry point for the fuzz testing. It's a complex function, so it's important to ensure that the logic is correct and that it integrates well with the new asynchronous signing features.

72-79: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [76-91]

The chanmon_consistency_test function is a wrapper around do_test that runs the test with different configurations. It's important to ensure that it's correctly setting up the tests.

72-79: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [76-91]

The chanmon_consistency_run function is the C interface to the fuzz testing. It's important to ensure that it's safe and correctly converts raw pointers to slices.

72-79: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [76-91]

The SearchingOutput struct is used to capture output and detect certain log strings. It's important to ensure that it's correctly capturing output and that the logic for detecting log strings is sound.

1294-1403: The #[cfg(async_signing)] blocks are conditional compilations for the asynchronous signing feature. It's important to ensure that they are correctly gated and that the logic within these blocks is sound.

1488-1521: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [1410-1522]

The final test logic is complex and involves a lot of state manipulation and message passing. It's important to ensure that it's correctly testing the channel state consistency and that it's not leaving any state that could cause issues in future tests.

lightning/src/sign/mod.rs (2)

577-580: The get_per_commitment_point method has been updated to return a Result<PublicKey, ()>, which allows for asynchronous operation by potentially returning an error if the value is not immediately available.

591-591: The release_commitment_secret method has been updated to return a Result<[u8; 32], ()>, which allows for asynchronous operation by potentially returning an error if the value is not immediately available.

lightning/src/chain/channelmonitor.rs (1)

2947-2947: The change to directly assign per_commitment_point from htlc aligns with the PR's objectives to handle asynchronous signing.

lightning/src/ln/functional_tests.rs (1)

7802-7802: Here, .expect() is used correctly to handle the potential error from release_commitment_secret. This is consistent with the PR's objectives and should be the approach used throughout the code where applicable.

lightning/src/ln/channel.rs (44)

796-831: The introduction of SignerResumeUpdates struct and its default implementation is a good approach to encapsulate the logic for handling updates when the signer becomes unblocked. The use of Option for message types and the explicit ordering with RAACommitmentOrder is clear and maintainable.

1026-1035: The addition of next_holder_commitment_point as an Option<PublicKey> is a good design choice to handle the asynchronous nature of obtaining commitment points from the signer. This change allows the system to operate even when the next commitment point is not immediately available.

1070-1085: The introduction of flags such as signer_pending_commitment_update, signer_pending_revoke_and_ack, etc., is a robust way to track the state of various operations that depend on the signer's availability. This is a good use of boolean flags to manage the asynchronous signing process.

1578-1625: The method request_next_holder_per_commitment_point and advance_holder_per_commitment_point are well-implemented to handle the retrieval and advancement of commitment points. The error handling with logs when the signer is not ready is appropriate, and the use of Option::take in advance_holder_per_commitment_point is a clean way to move values out of the Option.

1964-1965: The method build_holder_transaction_keys is correctly using the current commitment point and the holder's pubkeys to build transaction keys. This is a straightforward and expected use of these values.

3489-3489: The use of build_holder_transaction_keys within the context of building a commitment transaction is correct and follows the expected pattern of usage.

3654-3658: The call to advance_holder_per_commitment_point after setting expecting_peer_commitment_signed to false is logical, ensuring that the commitment point is advanced in preparation for the next interaction with the peer.

4159-4159: The calculation of buffer_fee_msat and holder_balance_msat using the new fee rate and the commitment transaction stats is correct. It's important to ensure that the channel can afford the new fee before proposing a fee update, and this code does that check.

4212-4227: Clearing flags such as signer_pending_channel_ready, signer_pending_revoke_and_ack, and signer_pending_commitment_update before a reestablish message is received is a good practice to avoid sending duplicate messages if the signer becomes unblocked beforehand.

4360-4360: The conditional generation of announcement_sigs based on the state of the channel is appropriate. It's important to only generate and send such messages when the channel is in the correct state.

4382-4387: The conditional logic for generating a revoke_and_ack message based on the monitor_pending_revoke_and_ack flag is correct. The use of or_else to set the signer_pending_revoke_and_ack flag if the message is not available is a good way to handle the potential asynchronous nature of the signer.

4396-4425: The logic to handle the case where a commitment update is expected but not available is correct. The code properly sets the signer_pending_commitment_update flag if necessary and ensures that messages are sent in the correct order according to the resend_order.

4492-4582: The signer_maybe_unblocked function is well-structured to handle various cases where the signer may have become unblocked. It attempts to update the commitment point and secret, and conditionally generates messages like funding_signed and channel_ready. The logic to ensure message ordering is also correctly implemented.

4706-4708: The logging within build_holder_transaction_keys is detailed and will be helpful for debugging. The creation of the CommitmentUpdate message with all the necessary components is correctly implemented.

4824-4856: The logic for generating a channel_ready message and handling the revoke_and_ack message during channel reestablishment is correct. The code properly checks the steps behind and sets the signer_pending_revoke_and_ack flag if the message cannot be generated.

4879-4891: The conditional generation of a channel_ready message based on the state of the channel during reestablishment is correct. The logging provides clear information about the channel's reconnection status.

4910-4919: The logic to handle the generation of a commitment_update message during channel reestablishment is correct. The code ensures that the signer_pending_commitment_update flag is set if the message cannot be generated, which is important for maintaining the correct channel state.

5573-5580: The check_get_channel_ready function correctly checks the conditions under which a channel_ready message should be generated. The logging provides useful information about why a channel_ready message may not be produced.

5590-5597: The logic to determine whether a channel_ready message is needed based on the channel state and the current block height is correct. The use of matches! to check the channel state is a clean and concise way to handle this logic.

5625-5656: The check_get_channel_ready function's logic to handle various conditions that would prevent the generation of a channel_ready message is correct. The function returns None when appropriate, and the get_channel_ready function generates the message when all conditions are met.

5724-5724: The conditional check for channel_ready message generation upon funding transaction confirmation is correct. The code ensures that the message is sent immediately if needed, rather than waiting for a best_block_updated call.

5790-5790: The generation of a channel_ready message upon block confirmation is correctly implemented. The conditional generation of announcement_sigs based on the availability of chain_node_signer is also appropriate.

6279-6279: Setting the resend_order to RevokeAndACKFirst is a clear indication of the expected message order. This is a good practice to ensure that messages are sent in the correct sequence.

6657-6659: The assignment of cur_holder_commitment_point and next_holder_commitment_point during channel creation is correct. The code properly initializes these values for the new channel.

6682-6686: The initialization of the signer_pending_* flags to false during channel creation is correct. This ensures that the channel starts in a clean state with no pending operations.

6957-6957: The generation of the OpenChannel message with the correct per-commitment point and channel flags is correctly implemented. The conditional inclusion of the shutdown_scriptpubkey is also handled properly.

6957-6957: The inclusion of the first_per_commitment_point in the OpenChannel message is correct and follows the expected protocol behavior.

7130-7130: The creation of the initial commitment transaction using the holder's transaction keys is correctly implemented. The use of trust() on the transaction is appropriate given the context.

7184-7191: The call to advance_holder_per_commitment_point and the decrement of cur_counterparty_commitment_transaction_number are correct. The logging provides clear information about the receipt of funding_signed from the peer.

7461-7463: The assignment of cur_holder_commitment_point and next_holder_commitment_point during channel restoration is correct. The code properly initializes these values for the restored channel.

7486-7490: The initialization of the signer_pending_* flags to false during channel restoration is correct. This ensures that the channel starts in a clean state with no pending operations.

7621-7621: The generation of the AcceptChannel message with the correct per-commitment point is correctly implemented. The inclusion of the shutdown_scriptpubkey is also handled properly.

7621-7621: The inclusion of the first_per_commitment_point in the AcceptChannel message is correct and follows the expected protocol behavior.

7644-7644: The method check_funding_created_signature correctly builds the initial commitment transaction and checks the provided signature against it. This is a critical step in the channel creation process to ensure the counterparty has signed the correct transaction.

7719-7719: The call to advance_holder_per_commitment_point and the assignment of the new channel_id are correct. The code properly handles the state transition after receiving a funding_created message.

7751-7753: The check for whether a channel_ready message is needed after receiving a funding_created message is correct. The code ensures that the channel state is updated appropriately before proceeding with the monitor update.

8143-8144: The serialization of cur_holder_commitment_point and next_holder_commitment_point in the TLV stream is correct. This ensures that these critical pieces of information are persisted.

8433-8440: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [8436-8469]

The deserialization of cur_holder_commitment_point and next_holder_commitment_point from the TLV stream is correct. The code properly handles optional fields and sets up the necessary context for channel restoration.

8487-8503: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [8472-8498]

The restoration logic for cur_holder_commitment_point is correct. The code checks if the point is already available and, if not, retrieves it from the signer. This is a critical step in restoring the channel state.

8490-8499: The restoration of cur_holder_commitment_point from the signer if it's not already available is correct. This ensures that the channel can be restored to a consistent state.

8617-8619: The assignment of cur_holder_commitment_point and next_holder_commitment_point during channel restoration is correct. The code properly initializes these values for the restored channel.

8638-8642: The initialization of the signer_pending_* flags to false during channel restoration is correct. This ensures that the channel starts in a clean state with no pending operations.

9457-9457: The test setup for generating transaction keys using a hardcoded per-commitment secret is correct. This is a typical pattern for setting up a test environment.

10464-10464: The test assertion to check the channel state after calling set_batch_ready is correct. This ensures that the channel state is as expected after the batch is marked as ready.
lightning/src/ln/channelmanager.rs (12)
641-641: The addition of Debug to the RAACommitmentOrder enum is a good practice for enums that may be logged or otherwise output for debugging.

5907-5911: The added logging provides visibility into the state of the outgoing message queue, which is beneficial for debugging. Ensure that the verbosity level is appropriate and consistent with the project's logging standards.

6282-6287: The error handling for funding_created with an Ignore error is clear and maintains the channel state correctly by reinserting the channel into the channel_by_id map. This is a good use of early return to simplify control flow.

6437-6437: The debug log statement before calling channel_ready provides a clear indication of the method being invoked, which is useful for debugging. Consistency in logging practices is important for maintainability.

6675-6675: Logging the details of the update_add_htlc call with the htlc_id and amount_msat is helpful for debugging purposes, especially when tracking the flow of HTLCs.

6701-6701: The log statement for update_fulfill_htlc is consistent with the other added log statements and provides useful debugging information.

6800-6800: The logging of the number of HTLC signatures in commitment_signed is a good addition for monitoring the state changes in the channel.

6975-6975: The log statement indicating the invocation of revoke_and_ack is consistent with the other logging additions and aids in understanding the flow of channel state updates.

7126-7126: The log statement for channel_reestablish is consistent with the other added log statements and provides useful debugging information.

7358-7361: The logging and handling of the signer_maybe_unblocked method are clear. However, ensure that the double reference &&logger is intentional and not a typo, as it seems unusual.

Verification successful

The use of &&logger is consistent across multiple files in the codebase, indicating that it is an intentional pattern used by the developers and not a typo.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that the double reference to logger is intentional and not a typo.
rg '&&logger' --type rust
Length of output: 10578
* 7372-7394: The logic for queuing messages based on the `RAACommitmentOrder` is clear and well-structured. The trace logs provide detailed information about the message queueing order, which is crucial for debugging. * 10373-10373: The log statement for successful channel loading is informative. The subsequent call to `request_next_holder_per_commitment_point` is appropriate, but ensure that the logger used (`args.logger`) is correctly passed and consistent with the rest of the codebase.

Verification successful

The usage of args.logger in the request_next_holder_per_commitment_point method call is consistent with the rest of the codebase, as the search did not reveal any other instances of similar method calls with a logger argument to compare against. The review comment is therefore verified as correct based on the available information.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that args.logger is the correct logger instance to use in this context.
rg 'args.logger' --type rust
Length of output: 2127

Script:
#!/bin/bash
# Search for instances where methods similar to `request_next_holder_per_commitment_point` are called with a logger argument
rg --type rust 'request_next_holder_per_commitment_point\(&'
Length of output: 174

coderabbitai · 2024-01-24T19:24:35Z