-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A Minimal Trusted Computing Base (TCB) #146
Comments
I'd be happy to be DIP manager for this btw. |
Responding to @aching's comments on diem/diem#7930 😄
In my mind, this comes under engineering quality and maintaining that quality going forward. I'll add a sentence to that section as something to call out.
Indeed, the choice of "Majority" and "Quorum" were really just to distinguish between the < Quorum and >= Quorum cases. I'll add a sentence or two to call this out. Note: Originally, I was in favour of replacing "Majority" with "Non-Quorum", but the text becomes a little verbose at times... Given that @mimoo introduced this nomenclature, I'll see if he has any concerns replacing "Majority" with "Non-Quorum", otherwise I'll just look into replacing it 😉
Aah, here I was following the established nomenclature (e.g., https://arxiv.org/pdf/1907.07010.pdf). However, if you feel this may be an issue, I'll update the text to avoid it altogether (e.g., "The Incremental TCB").
Indeed, this is another approach to consider. However, the trade-offs here are that it moves the TCB to a reactive security model, as opposed to a proactive one, i.e., attacks would be "possible" at the validator layer, but "reacted to" when caught by the re-verifying nodes. Moreover, it increases operational complexity and requires the re-verifying nodes to be considered as secure/important as the TCB (otherwise they could be compromised, too). However, like you say, the advantage here is that verification is taken off the critical path (and done asynchronously), so if the performance win is significant, it might be an interesting trade-off... I think the important next step here is to really understand how much performance will be impacted by moving execution correctness into the TCB (e.g. its own container).
I think this is something we'd need to explore in more detail. I agree that the current deployment model (i.e., using containers to isolate components) is pretty flimsy from a security point of view, but I think: (i) the engineering cost to "integrate directly" and then re-think in the future would probably be too high for any temporary performance/simplicity gains -- especially if we ultimately want to go down the route of TEEs/secure hardware; and (ii) there are probably a few things we could employ today that would be consistent with our future directions (e.g., on-premesis solutions, cloud virtualization, sandboxing technology, etc). It seems to me like this is a worthwhile area to spend some time looking at next.
The performance gain is ~5-10% as reported in cluster test, so in the "real world", I'd expect the gain to be a little less than this. While I agree it seems worse to leak the key vs. control of the key, I'd argue that, in practice, our mitigations make these cases equivalent: (i) we require frequent consensus key rotations, so as to avoid long range attacks; and (ii) if a node were ever compromised, we'd always rotate the key, even if it wasn't leaked, but controlled.
Aah, this was a typo. It should have read: "Remove the execution correctness key and allow safety rules & execution to communicate directly (inside the TCB)". Will update it above.
If you feel this would add value, we can do that. Although, it would be weird if this was the only document in there 😆 |
Yeah no problem changing these terms. The only term that mattered to me was a hawking compromise haha.
I think a DIP is more appropriate. BTW any private quip discussion that someone could summarize here as well for us external contributors? |
Also this part needs to be removed (step 2 of section 5):
there are no non-semantic forks in this model as safety is preserved |
Thanks @aching and @mimoo. I've updated the document based on the feedback! 😄
@mimoo, from the internal discussions there weren't any major/unexpected issues that came up. At a high-level, we agree that leaving execution correctness (EC) outside the TCB exposes us to risks around execution violations. However, the concerns with fixing this today are related to the implementation. These are:
To answer (2), I am currently working on a document that explores and quantifies this impact so we can better understand how the TCB will be affected. For (3), I'm also going to begin exploring the options available to us (today, as well as in the future, e.g., with TEEs) to better protect the TCB. Based on the outcomes of (2) and (3), the decision around (1) should become clearer. I will update this issue with these findings when we have them! 😄 |
We'll let this sit until you're ready to move forward with the proposal. Thanks! |
A Minimal Trusted Computing Base (TCB)
Authors: Joshua Lind (@JoshLind), David Wong (@mimoo)
Status: Rough draft (for discussion)
1. Goals of this Document:
2. Preliminary Reading:
TCB Overview
Securing TCBs
3. Assumptions and Validator Component Abstraction (VCA)
To reason about the Diem TCB, we first make several assumptions about validators and their components in a blockchain.
Assumptions about validators:
Note: We consider it future work to challenge these assumptions (see the bottom of this document).
Assumed components in a validator:
Next, we assume a simple validator component abstraction (VCA):
4. Security Formalization
In order to analyze the security benefits of a TCB, we propose the following (informal) security definitions:
Types of compromise:
Types of security impact:
The Adversary model:
Consensus assumes that
f
validators are byzantine and colluding (i.e., completely compromised). We therefore consider the TCB interesting if it can still provide security properties whenh
additional compromises occur (shallow or deep). We consider two adversary models:f
byzantine validators andh<=f
shallow or deep compromises.f
byzantine validators andh>f
shallow or deep compromises.Types of Attacks:
We consider three high-level types of attacks:
S
, we get the next committed state viaexecution(S, transaction),
i.e, this is the property we expect from the blockchain. Using this definition, correctness attacks can be divided into two categories:S
, we can extend it to a stateS'
, whereS'
is not the result ofexecution(S, transaction),
for anyS
andtransaction
. In this attack, honest verifying clients (verifying full nodes and validators) will become stuck and unable to reach the arbitrary state.f
validators are completely compromised, any further compromises will violate liveness globally.5. The Incremental TCB:
To begin reasoning about the TCB in Diem, we take a step by step approach to building a TCB based on the VCA above. For each step, we reason about the security guarantees of the design.
Step 1: TCB = { Consensus key }
To begin, we move only the consensus key into the TCB and propose that consensus asks the TCB to sign data (e.g., votes). Reasoning about security, we see:
f
byzantine nodes. In the quorum adversary model, semantic correctness can also be violated (an attacker can arbitrarily extend the state).Step 2: TCB = { Consensus key + Safety Rules }
To improve on step 1, we focus on hardening the validator against safety attacks. To do this, we partition consensus and move a subset of the consensus module into the TCB, labelled safety rules. Safety rules contains a set of verification constraints that when enforced by enough validators (
>= 2f+1
) prevent forks in the consensus protocol (see the Voting Rules in the Consensus specification). Reasoning about security, we now see:2f+1
validators to certify and commit a non-semantic extension.2f+1
validators can certify and commit a non-semantic extension. This is because a compromised safety rules will agree to vote on any execution state.Step 3: TCB = { Consensus Keys + Safety Rules + Execution }
To prevent attacks on correctness (as seen in step 2 above), we need to ensure that shallow compromises cannot enable voting on proposals that arbitrarily extend state. To achieve this, we observe that one can simply move the execution logic (including the Move VM) into the TCB. This will enforce correct execution of transactions. However, one still needs to ensure that execution extends the correct state. Here, one could move storage into the TCB. However, this is naive as it bloats the TCB. Instead, we observe that it is more beneficial to treat storage as untrusted and instead have execution keep track of valid state root hashes and update them within the TCB. We call this approach execution correctness.
We now reason about the security of this approach:
6. The Existing TCB (v1)
Today, execution correctness is still a work in progress and not part of the TCB. As such, shallow compromises defend against everything but correctness attacks (see step 2 of the incremental TCB). In this section, we take a look at various implementation details of the TCB as it stands today:
7. Proposal & Path Forward (TCB v2)
Based on the observations above, we outline the following design and implementation improvements for the TCB (v2):
Design Improvements:
Implementation Improvements:
8. Future Explorations for the TCB
The list below contains future explorations for the TCB. Each of these requires additional thought and analysis.
The text was updated successfully, but these errors were encountered: