-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AA | Design ideas to verify the initdata #446
Comments
Is kata-agent verifying Hash(InitData) == Hostdata really a crucial aspect of the InitData concept? isn't the TEE report and hence Hostdata essentially untrusted before remote attestation? |
Nice question. About this Tobin and Dan gave me an attack scenario, which summrized by Tobin
this can be prevented at step 2. before the kata agent uses the policy, we check the binding to the evidence. if the check does not pass, we don't use the new policy. this connects to something i was saying at the beginning of the meeting about the design of the AA endpoint that is going to check the binding. this endpoint must be available very early (before remote attestation) and it can't rely on any configuration that will be provisioned via the InitDat/Policy. What I envisioned is a very simple endpoint that detects which platform is running and then checks that the input matches the report. |
Thanks, that's an interesting scenario. If I understand it correctly this means the kata-agent is applying an untrusted policy initially. But why should the kata-agent do that? Shouldn't everything be locked down and only endpoints selectively opened using a verified policy? |
I am not a master of this. Let me try to explain. Now the policy is injected by pod yaml's annotation, and this will happen in a same context where the pod VM is created. During the same time, network set-up APIs will also be called and filtered by the policy. So before we can verify the policy by remote attestation, the policy is already in-use within a guest where the network is not prepared. If we want a way to use a new policy after attestation and before that a strict & verified policy is used, we might need to store the policy in KBS and specify the resource id of KBS in the initdata/launch parameter of kata-agent. Hi @danmihai1 please correct me if any. |
ok, this might be the key to understanding the reasoning behind that assertion, as the policy doesn't just apply to container workloads but also to the "infrastructure tier" in the guest to some degree. Hrm yeah it's tricky, kata cannot be fully-locked down fully, since it just feels odd, it appears like there is untrusted code (unverified kata-agent + OPA bins) verifying an untrusted policy (received via SetPolicy) using untrusted data (field in an unverified TEE report). And we can only state retroactively, after verifying each of those components via remote attestation, that the first verification was valid (some kind of higher-order verification ♻️) |
I remembered, this was also discussed in the peerpod context a while ago and @katexochen suggested to only allow |
Notice that this is not true in situations where we can do measurements at runtime (vTPM, for example with peerpods). In this case, we can policy when it is set, and we don't need to know the policy value upfront. Even if the attacker sets a malicious policy, they cannot influence the measurement of the policy that is done before the policy takes effect. A verifier can then later check the measurement and the attacker cannot prevent the verifier form discovering the malicious policy. However, I agree that in situations where we only have an initial launch measurement, we must validate the policy hash before the policy takes effect. |
I lean towards option B because it avoids the difficulty with systemd and in the non-systemd case it makes the changes needed in kata simpler. One potential drawback with option B is that adding a dynamic configuration update endpoint might make the attack surface of the AA larger. I don't think it creates any obvious security issues, but it might make it a little bit easier for an attacker to tamper with the AA. Maybe we should only allow this endpoint to be called once? The tricky part is making sure this will still work with static platforms like SEV and s390x. For other guest components, this will be easier. For example, the CDH will simply expect a config file in a certain location. Either this will be added to the initrd ahead of time or provisioned dynamically via InitData. This won't work for the AA because the config contains more dynamic data. Currently we configure the AA via the Another question is how we make this work for peer pods. I would like to keep the peer pods case as similar to the normal case as possible. One way to do this is to make the |
If I understood it correctly the endpoint would imply a Intuitively I'd think such a fn would be implemented as a no-op on TEEs which are not able to perform this assertion, because there is no field in the HW evidence that would hold the initdata hash. On TEEs that are able to perform runtime measurements (TPM, tdx via rtmr?) |
What I envision is a function SNP/TDX SEV/s390x vTPM Maybe |
Right. IMO, for scenarios where vTPM is not provided, kata-agent will try to call This logic of checking the existance of vTPM might be deterimined by kata-agent in runtime to avoid complex compilation features. |
I am a bit wary of adding attestation-specific code to the kata agent. The vTPM case is mainly for peer pods for now. I don't know what flow they want to use for provisioning config stuff. |
You are right. The requirements are
After some thinking I realize that we just need to do two things.
The difference between peerpod and kata-cc is the whole design ways. kata-cc uses initdata field inside evidence to bind the integrity of initdata and therefore should leverage AA's Peerpod uses vTPM's runtime measurement ability to bind the integrity. So We can always let kata-agent call
On vTPM based platforms, it hints that a component like
Did I ignore anything important? This way, we do not break/extend any semantics of the APIs. |
@Xynnn007 thanks for the write up, that's a good summary. Logically this all makes sense to me, in the concrete implementation aspects we'll have to see, what we have to adjust. At the moment So PUD wouldn't be able to call AA, since there's no AA process yet. I'm not entirely sure about the plans for subprocessing attestation-agent in kata#main (whether we want to keep it or have some sort of init system), so it might not be a problem in the future.
So, how and if peerpod would be able to use the |
Right. Seems that the cyclic denepdency could be handled by adding an extra API The reason why I mentioned PUD call AA to leverage PCR, is a) AA already has the ability to do runtime measurement extending b) putting all attestation functionalities to one component seems more "clean". |
Yes, as we have the endpoint, we should use it if possible, I agree. |
I doubt if the init_data also includes the KBS service url, how can we assure the remote attestation is trusted? The kbs could also be a malicious endpoint. |
I would assume the situation for setting a hash of the KBS endpoint to the report's hostdata field is logically not very different from passing the endpoint to the guest via kernel cmdline. In both cases the KBS endpoint (directly or as a derived hash) is part of a signed TEE report. Now, a malicious, privileged actor could tamper with either one of those and inject a new KBS endpoint. That KBS endpoint could fake remote attestation. Consequences: The fake KBS is not able to provide secrets, but it might deliver the wrong ones. It would seem that a workload is running encrypted in a TEE, when it's not. The former (malicious secret) is an issue that deserves attention IMO, because there's a practical use case that comes to mind with Azure's SAS or AWS' Presigned URLs. Those are often used as an alternative to specific IAM privileges enabling a workload to push/retrieve data from an Object Store. The latter (masquerading a non-encrypted workload as a running in a TEE) is a fundamental issue with CoCo, i think. With confidential pods a kubernetes user cannot trust what the control plane is reporting anyway. The consequence might be that only a workload whose execution is depending on a secret stored in a trusted KBS can be trusted itself (like an encrypted image). This topic is discussed at length in Tobin's gist, but I'm afraid there's no clear answer yet. |
Background
Currently, we are promoting the initdata mechanism in the community. The core idea is to leverage initdata field (HOSTDATA for SNP, MRCONFIGID for TDX, etc.) to bind the hash value of any data injected into the guest. One of the most critical design is how AA exposes an interface for kata-agent to verify the binding between the hash of initdata plaintext and the corresponding initdata field inside TEE evidence.
Overall, we should take a way to do the following things:
are finished by @danmihai1 perfectly in https://github.com/kata-containers/kata-containers/pull/8469/files#diff-df928933d70fb4e5616a6ecb3d8a1340adbf328dd3d6973b3a00ce5c75aae23aR1 which we can reuse in AA.
The step 2 is now to be designed. This issue aims to collect ideas upon this.
@fitzthum @danmihai1 and I have been discussing this and let's continue publicly.
Design Concerns and Different Ways
There are some design concerns
HOSTDATA
,MRCONFIGID
, s390x and AMD SEV do not.KBS url
,KBS public key
together with some other parameters should be part of AA's configuration file. Those are all things provided via initdata.Thus, we need a mechanism
A. Initdata as a launch parameter
Initdata check will be part of AA's launch parameter.
The parameter
--initdata
item is optional. If it is given, it will check the binding against an evidence report as soon as AA is launched.For s390x/sev, AA will do nothing but only raise a warning
On XXX platform initdata is not support
, thus ignore--initdata
parameter when it is given.This is difficult for systemd scenarios, as we should have a caller before AA that knows the initdata and the configuration of AA. Also, if the AA fails to start, the caller (e.g. kata-agent) cannot know that.
B. Initdata as an AA API parameter
Let's temporarily call the api
CheckInitdata
.CheckInitdata
of AA to check the binding.UpdateConfigurations
of AA.cc @jiazhang0 @jialez0
The text was updated successfully, but these errors were encountered: