Skip to content

Conversation

mythi
Copy link
Contributor

@mythi mythi commented Aug 27, 2025

This setup gives an automated "online, multi-platform, PCCS based Indirect Registration" and TDX QGS deployment for Kubernetes based clusters.

Building blocks:

  1. in-cluster PCCS caching service deployment
  2. PCKIDRetrievalTool sidecar and TDX QGS in a single daemonset

Pre-conditions:

Read the basics of Intel TDX remote attestation infrastructure setup and get an Intel PCS API Key. The node(s) have TDX and SGX enabled. The following also assumes that a user has cloned this PR and has a bare-metal cluster available.

Installation:

  1. Deploy SGX device plugin with the "DCAP infrastructure resources" enabled:

kubectl apply -k deployments/sgx_plugin/overlays/dcap-infra-resources

  1. Make the new (unpublished) images available to your cluster:
make sgx-pccs sgx-dcap-infra
docker save intel/sgx-pccs:devel | sudo ctr -n k8s.io i import -
docker save intel/sgx-dcap-infra:devel | sudo ctr -n k8s.io i import -
  1. Deploy PCCS
pushd deployments/sgx_dcap/pccs
<check notes in kustomization.yaml to populate .env.pccs-tokens>
kubectl apply -k .
popd

NB: if a proxy setting is needed, edit pccs.yaml

  1. Deploy platform-registration and TDX QGS
pushd deployments/sgx_dcap/base
<get your USER_TOKEN to .env.pccs-credentials>
kubectl apply -k .
popd

NB: add nodeSelector to filter SGX/TDX enabled nodes if run in a multi-node cluster

  1. Check things are up:
NAME                               READY   STATUS    RESTARTS   AGE
intel-dcap-node-infra-q9zms        2/2     Running   0          17h
intel-dcap-pccs-647568f67d-ftjb2   1/1     Running   0          17h
intel-sgx-plugin-bgfgv             1/1     Running   0          17h
$ kubectl logs -c platform-registration intel-dcap-node-infra-q9zms 
Waiting for the PCCS to be ready ...
PCCS is online, proceeding ...
Calling PCKIDRetrievalTool ...

Intel(R) Software Guard Extensions PCK Cert ID Retrieval Tool Version 1.23.100.0

Registration status has been set to completed status.
the data has been sent to cache server successfully!

The node should have /var/run/tdx-qgs/qgs.socket available for QEMU to connect.

Notes:

PCCS database is stored to a RAM based EmptyDir volume and currently not backed up (a backup mechanism is added later). Keep the PCCS deployment up. If quoting errors occur, full re-install after an SGX Factory Reset might be required.

@mythi mythi force-pushed the PR-2025-018 branch 2 times, most recently from 691dfe1 to 65ce152 Compare August 29, 2025 08:08
Comment on lines +9 to +16
WORKDIR /opt/intel

ARG SGX_SDK_URL=https://download.01.org/intel-sgx/sgx-linux/2.26/distro/ubuntu24.04-server/sgx_linux_x64_sdk_2.26.100.0.bin

RUN curl -sSLfO ${SGX_SDK_URL} \
&& export SGX_SDK_INSTALLER=$(basename $SGX_SDK_URL) \
&& chmod +x $SGX_SDK_INSTALLER \
&& echo "yes" | ./$SGX_SDK_INSTALLER \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All probs to @ScottR-Intel for: sudo ./sgx_linux_x64_sdk_2.26.100.0.bin --prefix /opt/intel


# self-signed TLS certs for pccs-tls:
# openssl req -x509 -sha256 -nodes -days 365 -newkey rsa:2048 -keyout private.pem -out file.crt -subj "/C=US/ST=Denial/L=Springfield/O=Dis/CN=www.example.com"
# token hashesh follow (with 'hellworld' changed to the desired secret tokens):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# token hashesh follow (with 'hellworld' changed to the desired secret tokens):
# token hashesh follow (with 'helloworld' changed to the desired secret tokens):

name: pccs-credentials
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to change this to make it work:

-          allowPrivilegeEscalation: false
+          privileged: true
+          allowPrivilegeEscalation: true

I think that for socket device plugins the container that exposes the unix socket shall be privileged (see the pr-helper example).

@MatiasVara
Copy link

MatiasVara commented Sep 18, 2025

I tried and the unix socket is not visible from the virt-launcher. This means that we still need something like a socket device plugin in the virt-handler to mount it. I do not think this is the place for that PR though.

@MatiasVara
Copy link

I observe that when remove the qgs pod, the new instance fails because the unix socket still exists. I think the unix socket should be removed when the pod is removed otherwise the unix should be removed manually.

@mythi
Copy link
Contributor Author

mythi commented Sep 25, 2025

I think the unix socket should be removed when the pod is removed otherwise the unix should be removed manually.

I saw this too and reported a bug to QGS about this. I need to see if I can workaround that in the mean time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants