-
Notifications
You must be signed in to change notification settings - Fork 42
Double-signing prevention (MVP for launch) #60
Comments
Just a short clarification: The device only signs blocks in incremental order. To define the initial/current block height, the block height of the first KMS request after plugging is used as a reference. Setting this initial value requires manual user confirmation in the ledger device. A similar approach could be used by signatory providers. A good idea would be to create a signatory provider wrapper with this functionality. |
@jleni does it just preserve monotonicity, or does it require each signed block immediately follow the previous one? I am wondering about things like failover. |
This behavior is according to the specs. It will sign in monotonic order, they do not need to be sequential.
Adding Failover/HA to KMS is an interesting follow up. It might actually need KMS/signatory arbitration + something like Raft (consensus) to handle these cases. |
My understanding - please correct me if wrong - is that HSM2 double signing prevention will be implemented by tracking the last signed height, which is persisted in one of the slots of the device. KMS will then need to update this slot before signing each block, and should ideally read the data back to ensure it was stored correctly. To make this as robust as possible, if the update/read cycle fails, KMS should complain loudly, but continue operating in degraded state. It could still prevent double signing by using locally cached data, and I guess(?) the HSM2 might still continue signing. Apologies if this seems trivial, but I thought it was worth stressing since I couldn't find any MTBF data on the HSM2. Validator failure due to write wear on the HSM would be the worst. TL;DR - the failure characteristics of the underlying devices (YubiHSM2, Ledger etc.) should be carefully considered. They might not be designed for write intensive operations, including double signing prevention, and could wear out. |
@mdyring that's a good point. I will investigate that before I go forward with this approach. |
I can answer about Ledger nano S.
Yes, nvram is rated at 500k erase/write cycles. Actually, it is a bit more
complicated due to write amplification as pages are aligned at 64-byte
boundaries.
Anyway, we had these limitations in mind. Block number/rounds are tracked
in RAM, not nvram to avoid this issue. When the device is plugged will skip
first votes and request the user for confirmation to align with current
values.
I hope this answers your question.
…On Fri, 23 Nov 2018, 09:55 Martin Dyring-Andersen ***@***.*** wrote:
My understanding - please correct me if wrong - is that HSM2 double
signing prevention will be implemented by tracking the last signed height,
which is persisted in one of the slots of the device.
KMS will then need to update this slot before signing each block, and
should ideally read the data back to ensure it was stored correctly.
To make this as robust as possible, if the update/read cycle fails, KMS
should complain loudly, but continue operating in degraded state. It could
still prevent double signing by using locally cached data, and I guess(?)
the HSM2 might still continue signing.
Apologies if this seems trivial, but I thought it was worth stressing
since I couldn't find any MTBF data on the HSM2. Validator failure due to
write wear on the HSM would be the silly.
TL;DR - the failure characteristics of the underlying devices (YubiHSM2,
Ledger etc.) should be carefully considered. They might not be designed for
write intensive operations, including double signing prevention, and could
wear out.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#60 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ANEF2MDjUgzVMMmcsiQhlNuGzfv5lpZZks5ux7f3gaJpZM4XGfoj>
.
|
That seems like a sensible solution. In case KMS needs to be restarted, would this require physical access to the ledger device for confirmation? Ideally it should be possible to restart/update software remotely. Alternatively, KMS could signal a "clean shutdown" to the device which can then write to nvram and use that information on next start? (this could be useful in cases where a server needs to be power cycled) |
We talked with Yubico about wear on the flash. One of their reps suggested it wouldn't be an issue, although it's something I'm loathe to risk without a precise MTBF. The last thing we want is a bunch of validators dying at the same time because they all wore out their flash roughly at the same time. |
This case won't work with the Ledger, since a power-cycle will restart the Ledger. In order to unlock the ledger you have to physically enter your PIN, which means you have to be at the datacenter anyway. That's why persistent storage for height/round is not as important with the ledger. |
I'm just about ready to (finally) start work on this. Here is a tentative plan:
There was some earlier discussion of persisting this information in e.g. YubiHSM2's opaque data. Due to concerns about write wear, I don't think this is a good idea. Another alternative would be introducing some kind of embedded database, e.g. sled, LevelDB, or LMDB. That seems like a complicated change to introduce right now, and also something where the other potential / future persistence needs of the KMS need to be considered. The existing |
with respect to 2.
This would be more secure and allow support several devices connected to the same KMS. |
@jleni I'm not sure that syntax is valid TOML... it seems like you want:
? I'd agree the config syntax needs changes, but the main thing that needs to be solved, particularly in the context of this issue, is expressing an m:n mapping between Tendermint networks/chains and keys. With your proposed syntax, I think we'd need at least:
|
Yes sorry! :)
I was just trying to explain the high level idea.
…On Sat, 2 Mar 2019, 05:01 Tony Arcieri, ***@***.***> wrote:
@jleni <https://github.com/jleni> I'm not sure that syntax is valid
TOML... it seems like you want:
[[keys]]
id = "gaia-6000"
pubkey="123...."
key = 1
[[keys]]
id = "gaia-7000"
pubkey="123...."
key = 2
?
I'd agree the config syntax needs changes, but the main thing that needs
to be solved, particularly in the context of this issue, is expressing an
m:n mapping between Tendermint networks/chains and keys.
With your proposed syntax, I think we'd need at least:
{ key = 1, chains = ["gaia-6000", "gaia-7000"], ... }
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#60 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ANEF2MTrntURPR31rn-kUfNNjXJcYt1vks5vSfeHgaJpZM4XGfoj>
.
|
Regarding support for multiple chains, it would be very nice if a single tmkms instance could support multiple (tendermint based, running compatible version, etc.) projects, such as IRIS, Cosmos, IOV, etc. Could be it something as simple as making the HRP of bech32 addresses configurable? |
This commit moves the last sign state tracking for chains into the global chain registry. It also allows configurable locations for the chain state tracking files, which should probably make it easier to ensure they don't clobber each other in tests. This doesn't fully implement the double signing plan in #60 but at this point I think we're close enough. The remaining item is (optionally) running a user-specified command on startup to query the current block height.
This is a tracking issue for KMS double-signing prevention.
Launch goal: attempt to prevent validator bugs or data loss from causing the validator to double sign by making the KMS aware of the current block height.
Longer-term goal: provide defensive capabilities / survive compromise of validator hosts. See #115 for discussion of post-launch double signing improvements.
Previous discussion:
Current Status
signatory-ledger-cosval
signer can track current block height (i.e. it passes the current block height to the Cosmos app running on the Ledger hardware device which persists the previous signed block height), and will refuse to double-sign.Launch Plan
Add support to the KMS for a user-configurable subcommand for obtaining the current block height. This can be used to "bootstrap" a block height value when a KMS process is started. From there, the KMS can track the last block it signed.
For example, the KMS could call out to a shell script which hits the
/status
RPC endpoint for a validator's sentries, piping the output through e.g.jq
to extract"latest_block_height"
andsort
ing the results, taking the highest value. An example script can be included in the KMS repo which people can customize to their needs.This should allow validators to choose whatever mechanism they like for providing the KMS with the current block height, and implement e.g. storing the current block height in external databases as proposed in #11.
Longer-Term Plan
See #115.
The text was updated successfully, but these errors were encountered: