Replies: 2 comments
-
CC #2088 |
Beta Was this translation helpful? Give feedback.
0 replies
-
We discussed this a bit more on discord, but I prefer the approach in #2554 better - in general, we use signing for "generate a message" logic, and most of that logic is already replay-able as we need to replay if we disconnect from the peer and reconnect to discover that the original message never made it to the peer. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Motivation
We run an LDK-based multi-tenant container; i.e., a single process that runs several independent Lightning nodes concurrently. For some of these nodes, we'd like to delegate the signing operations to a remote service.
A remote signing service could be a secure enclave running in the same datacenter, a service operated by a third-party that maintains the signing keys, or even an individual device.
In each of these cases, we'd make an remote request to the signing service to perform the actual signing operation, or to provide LDK with a necessary secret (e.g., the per-commitment secret).
For this to work, each of the operations that LDK might perform must be fallible; i.e., it should be allowed to transiently fail, and then be resumed later when the result becomes available.
Currently (as of 0.0.116), LDK's signing interfaces (e.g.,
ChannelSigner
) are not infallible:Result
type. For exampleget_per_commitment_point
returnsPublicKey
and admits no possibility that the commitment point is not immediately available.Result
type, but are fallible "in signature only": actually returning an error will crash LDK.Result
type but any error result will cause an immediate channel force-closure.We'd like to work through the various signing interfaces and improve LDK's implementation to support the above use case. In particular, each method should admit an implementation that may not immediately have a result but has not failed permanently.
As a motivating example, consider an implementation of the
ChannelSigner
interface implemented using webhooks. In a canonical webhook-based design, a request is sent via HTTPPOST
to a remote server. The response is typically a short200 OK
, followed later by the remote server issuing an HTTPPOST
back to the requester with the results.Proof-of-concept
A proof-of-concept implementation is in-progress in #2487, which specifically addresses
ChannelSigner::get_per_commitment_point
andChannelSigner::release_commitment_secret
. Our goal here is to explore how we might rework LDK's internals to support 1) these methods returning aResult
type (and so they can fail), and 2) resuming the channel state machine appropriately when a result is returned.After some initial prototyping and discussion, we opted for the following approach:
get_per_commitment_point
andrelease_commitment_secret
to return an error that is simply the unit type,()
. Current in-memory implementations of the signer never return an error, and so they need to change only inasmuch as to wrap the results inOk(...)
.Err
result to be user-defined as follows. If the signing failure is permanent, then the user must handle force-closing the channel themselves after returning theErr
result. On the other hand, if the signing failure is temporary (e.g., requires a response from a remote party), then the user can explicitly retry the operation when the results are available.get_per_commitment_point
orrelease_commitement_secret
and receives anErr
result, it unwinds out and stores a retry state associated with the channel in the per-peer state.ChannelManager
method,retry_channel
. This method accepts the remote peer's public key and the channel ID, and restarts the operation that previously had failed. The assumption is that now the request toget_per_commitment_point
orrelease_commitment_secret
will succeed because the signer implementation will have the required material.As an example, consider the following (somewhat simplified) flow that occurs during
commitment_signed
:Here the
WebookSignerImpl
is an implementation of theChannelSigner
interface provided by a user, andSigningService
is the service to which that implementation is delegating the signature operations.Upon receiving an error response from the signer, the
commitment_signed
handler in theChannel
propagates the error out to theChannelManager
that then notes that the channel is pending retry forcommitment_signed
. (Specifically, it does so by adding an entry in a new per-peer state table keyed by channel ID whose value is anenum
with sufficient side-information to restart the operation.)Later, when the user's
WebhookSignerImpl
has been provided with information sufficient to proceed, it invokes theretry_channel
on theChannelManager
, passing in the peer and channel ID. From this, theChannelManager
can recover the retry state and restart thecommitment_signed
processing.Overview of changes
get_per_commitment_point
andrelease_commitment_secret
to return aResult
type. The error type is the unit type, and is interpreted to mean either a) the channel is not ready, and the user will later attempt to resume processing by callingretry_channel
, or b) the channel signer has permanently failed and the user will eventually force-close (or abandon) the channel.Context
. These are bothOption
values, withNone
for a newly constructed channel awaiting its first commitment point (or revocation).by the caller to mean "the information is not ready".
Channel
andChannelManager
handlers that initialize or modify the per-commitment point to cache the values correctly:Channel::funding_signed
Channel::commitment_signed
Channel::channel_reestablish
Channel::funding_created
ChannelManager::do_accept_inbound_channel
ChannelManager::create_channel
(still to do)Beta Was this translation helpful? Give feedback.
All reactions