Can I assign a GPU resource to an enclave? #543

BaiChienKao · 2023-10-05T19:31:40Z

I'm currently engaged in research involving enclaves and I'm interested in optimizing certain applications by utilizing GPU resources. Unfortunately, I cannot find a way to assign a GPU resource to an enclave. My research from 2021 indicated that this feature was not supported. I'm curious if there have been any developments since then, and whether GPU assignment for enclaves is now possible.

meerd · 2024-04-05T15:18:24Z

Hello @BaiChienKao,

Enabling GPU attachment for Enclaves is on our radar, but there are no immediate plans to implement this feature.

andrcmdr · 2024-09-12T06:15:18Z

This should be set as a top priority for AWS cloud now, in the light of AI technologies evolving and the appearance of first GPU TEE discrete adapters (Hopper H100 and Blackwell H200 architectures from NVidia) for CC (confidential computing mode) on GPU, and 'cause P5 and P5e EC2 instances with H100 already available in AWS cloud.

But looks like Nitro is still not support GPU TEE for AWS cloud and not support enabling discrete adapters on a PCI bus, although the NSM module itself is a virtual (virtio based) PCI device to interact with Nitro hypervisor (hope its code will be published as well, as it is based on KVM - this will improve the chain of trust and will gives improved attestation for all components of the Nitro platform).

There are other options available - the KVM/QEMU VMs with support for AMD SEV-SNP or Intel TDX, VM based CPU TEE, and NVidia's Hopper/Blackwell MIG TEE enabled with NVtrust.
But AWS cloud and Nitro still has a great usability to run confidential computing resources.

Guys and gals, you definitely should take this into more closer consideration and implement it ASAP in near perspective.

Cc @meerd @andraprs @eugkoira @axlprv @agraf @jdbean

Our ML researching and cloud infrastructure teams at @sentient-xyz (https://sentient.foundation) are really do need GPU TEE feature for P5 and P5e instances with H100/H200 GPUs with support of on-chip confidential computing (MIG based TEE in Hopper architecture) in isolated GPU memory.
This is essential for training and fine-tuning large models on sensitive non-public data.

Found only this article, which mentioned P5, P5e and Nitro, but doesn't give any meaningful information about support of GPU TEE and only gives false expectations.
In fact article only mentioned the 3,200 Gbps of Elastic Fabric Adapter (EFA) v2 networking and that up to 3200 Gbps of EFA networking enabled by AWS Nitro System, i.e. Nitro here is mentioned only in context of networking while for nd users it is mostly a NSM module interacting with hypervisor though IOCTL bus for VM based TEE.

https://aws.amazon.com/blogs/machine-learning/introducing-three-new-nvidia-gpu-based-amazon-ec2-instances/

We have combined NVIDIA’s powerful GPUs with differentiated AWS technologies such as AWS Nitro System, 3,200 Gbps of Elastic Fabric Adapter (EFA) v2 networking, hundreds of GB/s of data throughput with Amazon FSx for Lustre, and exascale computing with Amazon EC2 UltraClusters to deliver the most performant infrastructure for AI/ML, graphics, and HPC.

To power the development, training, and inference of the largest large language models (LLMs), EC2 P5e instances will feature NVIDIA’s latest H200 GPUs, which offer 141 GBs of HBM3e GPU memory, which is 1.7 times larger and 1.4 times faster than H100 GPUs. This boost in GPU memory along with up to 3200 Gbps of EFA networking enabled by AWS Nitro System will enable you to continue to build, train, and deploy your cutting-edge models on AWS.

TonyGiorgio · 2024-10-01T19:04:11Z

I would also like this. For now, I'm connecting my enclave to another provider that runs their stuff on azure's confidential compute in order to get the H100 TEE feature.

TonyGiorgio · 2024-12-04T17:39:35Z

With the new Trainium 2, is this essentially possible now? Would the enclave part work out of the box with one of those instances?

https://aws.amazon.com/ec2/instance-types/trn2/

meerd mentioned this issue Apr 5, 2024

Assigning GPU to Nitro Enclaves #517

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I assign a GPU resource to an enclave? #543

Can I assign a GPU resource to an enclave? #543

BaiChienKao commented Oct 5, 2023

meerd commented Apr 5, 2024

andrcmdr commented Sep 12, 2024 •

edited

Loading

TonyGiorgio commented Oct 1, 2024

TonyGiorgio commented Dec 4, 2024

Can I assign a GPU resource to an enclave? #543

Can I assign a GPU resource to an enclave? #543

Comments

BaiChienKao commented Oct 5, 2023

meerd commented Apr 5, 2024

andrcmdr commented Sep 12, 2024 • edited Loading

TonyGiorgio commented Oct 1, 2024

TonyGiorgio commented Dec 4, 2024

andrcmdr commented Sep 12, 2024 •

edited

Loading