Skip to content

Commit

Permalink
Merge pull request bottlerocket-os#4169 from bryantbiggs/chore/cuda-d…
Browse files Browse the repository at this point in the history
…oc-clarification
  • Loading branch information
bcressey authored Sep 25, 2024
2 parents c41aa9e + b1187cb commit 3ddbd44
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion QUICKSTART-ECS.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ Once it launches, you should be able to run tasks on your Bottlerocket instance
### aws-ecs-*-nvidia variants

The `aws-ecs-*-nvidia` variants include the required packages and configurations to leverage NVIDIA GPUs.
They come with the [NVIDIA Tesla driver](https://docs.nvidia.com/datacenter/tesla/drivers/index.html) along with the libraries required by the [CUDA toolkit](https://developer.nvidia.com/cuda-toolkit) included in your ECS tasks.
They come with the [NVIDIA Tesla driver](https://docs.nvidia.com/datacenter/tesla/drivers/index.html) along with the libraries required by the [NVIDIA container runtime](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit).
In hosts with multiple GPUs (ex. EC2 `g4dn` instances) you can assign one or multiple GPUs per container by specifying the resource requirements in your container definitions as described in the [official ECS documentation](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-gpu.html):

```json
Expand Down
4 changes: 2 additions & 2 deletions QUICKSTART-EKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -350,11 +350,11 @@ For example, to run busybox:
### aws-k8s-*-nvidia variants

The `aws-k8s-*-nvidia` variants include the required packages and configurations to leverage NVIDIA GPUs.
They come with the [NVIDIA Tesla driver](https://docs.nvidia.com/datacenter/tesla/drivers/index.html) along with the libraries required by the [CUDA toolkit](https://developer.nvidia.com/cuda-toolkit) included in your orchestrated containers.
They come with the [NVIDIA Tesla driver](https://docs.nvidia.com/datacenter/tesla/drivers/index.html) along with the libraries required by the [NVIDIA container runtime](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit).
They also include the [NVIDIA k8s device plugin](https://github.com/NVIDIA/k8s-device-plugin).
If you already have a daemonset for the device plugin in your cluster, you may need to use taints and tolerations to keep it from running on Bottlerocket nodes.

Additional NVIDIA tools such as [DCGM](https://github.com/NVIDIA/dcgm-exporter) and [GPU Feature Discovery](https://github.com/NVIDIA/gpu-feature-discovery) will work as expected.
Additional NVIDIA tools such as [DCGM exporter](https://github.com/NVIDIA/dcgm-exporter) and [GPU Feature Discovery](https://github.com/NVIDIA/gpu-feature-discovery) will work as expected.
You can install them in your cluster by following the `helm install` instructions provided for each project.

The [GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html#install-nvidia-gpu-operator) can also be used to install these tools.
Expand Down

0 comments on commit 3ddbd44

Please sign in to comment.