From 2a33e08c0e726d35decefc17f879459b3b8db7bc Mon Sep 17 00:00:00 2001 From: Bryant Biggs Date: Fri, 30 Aug 2024 11:58:14 -0500 Subject: [PATCH 1/3] chore: Correct documentation to state that NVIDIA container runtime is included, not CUDA toolkit --- QUICKSTART-ECS.md | 2 +- QUICKSTART-EKS.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/QUICKSTART-ECS.md b/QUICKSTART-ECS.md index b642cf5f526..dcfa577fcf7 100644 --- a/QUICKSTART-ECS.md +++ b/QUICKSTART-ECS.md @@ -201,7 +201,7 @@ Once it launches, you should be able to run tasks on your Bottlerocket instance ### aws-ecs-*-nvidia variants The `aws-ecs-*-nvidia` variants include the required packages and configurations to leverage NVIDIA GPUs. -They come with the [NVIDIA Tesla driver](https://docs.nvidia.com/datacenter/tesla/drivers/index.html) along with the libraries required by the [CUDA toolkit](https://developer.nvidia.com/cuda-toolkit) included in your ECS tasks. +They come with the [NVIDIA Tesla driver](https://docs.nvidia.com/datacenter/tesla/drivers/index.html) along with the libraries required by the [NVIDIA container runtime](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit) included in your ECS tasks. In hosts with multiple GPUs (ex. EC2 `g4dn` instances) you can assign one or multiple GPUs per container by specifying the resource requirements in your container definitions as described in the [official ECS documentation](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-gpu.html): ```json diff --git a/QUICKSTART-EKS.md b/QUICKSTART-EKS.md index cffad9b4a4f..3a16d6b1b75 100644 --- a/QUICKSTART-EKS.md +++ b/QUICKSTART-EKS.md @@ -350,11 +350,11 @@ For example, to run busybox: ### aws-k8s-*-nvidia variants The `aws-k8s-*-nvidia` variants include the required packages and configurations to leverage NVIDIA GPUs. -They come with the [NVIDIA Tesla driver](https://docs.nvidia.com/datacenter/tesla/drivers/index.html) along with the libraries required by the [CUDA toolkit](https://developer.nvidia.com/cuda-toolkit) included in your orchestrated containers. +They come with the [NVIDIA Tesla driver](https://docs.nvidia.com/datacenter/tesla/drivers/index.html) along with the libraries required by the [NVIDIA container runtime](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit) included in your orchestrated containers. They also include the [NVIDIA k8s device plugin](https://github.com/NVIDIA/k8s-device-plugin). If you already have a daemonset for the device plugin in your cluster, you may need to use taints and tolerations to keep it from running on Bottlerocket nodes. -Additional NVIDIA tools such as [DCGM](https://github.com/NVIDIA/dcgm-exporter) and [GPU Feature Discovery](https://github.com/NVIDIA/gpu-feature-discovery) will work as expected. +Additional NVIDIA tools such as [DCGM exporter](https://github.com/NVIDIA/dcgm-exporter) and [GPU Feature Discovery](https://github.com/NVIDIA/gpu-feature-discovery) will work as expected. You can install them in your cluster by following the `helm install` instructions provided for each project. The [GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html#install-nvidia-gpu-operator) can also be used to install these tools. From 4fe01817d46c5492c546042f444f9dec102cc3fe Mon Sep 17 00:00:00 2001 From: Bryant Biggs Date: Wed, 4 Sep 2024 10:43:57 -0500 Subject: [PATCH 2/3] Update QUICKSTART-ECS.md Co-authored-by: Ben Cressey --- QUICKSTART-ECS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/QUICKSTART-ECS.md b/QUICKSTART-ECS.md index dcfa577fcf7..ddac7492a0b 100644 --- a/QUICKSTART-ECS.md +++ b/QUICKSTART-ECS.md @@ -201,7 +201,7 @@ Once it launches, you should be able to run tasks on your Bottlerocket instance ### aws-ecs-*-nvidia variants The `aws-ecs-*-nvidia` variants include the required packages and configurations to leverage NVIDIA GPUs. -They come with the [NVIDIA Tesla driver](https://docs.nvidia.com/datacenter/tesla/drivers/index.html) along with the libraries required by the [NVIDIA container runtime](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit) included in your ECS tasks. +They come with the [NVIDIA Tesla driver](https://docs.nvidia.com/datacenter/tesla/drivers/index.html) along with the libraries required by the [NVIDIA container runtime](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit). In hosts with multiple GPUs (ex. EC2 `g4dn` instances) you can assign one or multiple GPUs per container by specifying the resource requirements in your container definitions as described in the [official ECS documentation](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-gpu.html): ```json From b1187cbfbf5bab0acdaaf834ba35ebb3beb03825 Mon Sep 17 00:00:00 2001 From: Bryant Biggs Date: Wed, 4 Sep 2024 10:44:06 -0500 Subject: [PATCH 3/3] Update QUICKSTART-EKS.md Co-authored-by: Ben Cressey --- QUICKSTART-EKS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/QUICKSTART-EKS.md b/QUICKSTART-EKS.md index 3a16d6b1b75..a7d177b41e4 100644 --- a/QUICKSTART-EKS.md +++ b/QUICKSTART-EKS.md @@ -350,7 +350,7 @@ For example, to run busybox: ### aws-k8s-*-nvidia variants The `aws-k8s-*-nvidia` variants include the required packages and configurations to leverage NVIDIA GPUs. -They come with the [NVIDIA Tesla driver](https://docs.nvidia.com/datacenter/tesla/drivers/index.html) along with the libraries required by the [NVIDIA container runtime](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit) included in your orchestrated containers. +They come with the [NVIDIA Tesla driver](https://docs.nvidia.com/datacenter/tesla/drivers/index.html) along with the libraries required by the [NVIDIA container runtime](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit). They also include the [NVIDIA k8s device plugin](https://github.com/NVIDIA/k8s-device-plugin). If you already have a daemonset for the device plugin in your cluster, you may need to use taints and tolerations to keep it from running on Bottlerocket nodes.