Skip to content

Releases: NVIDIA/k8s-device-plugin

v0.15.0-rc.1

26 Feb 13:59
Compare
Choose a tag to compare
v0.15.0-rc.1 Pre-release
Pre-release

What's Changed

  • Import GPU Feature Discovery into the GPU Device Plugin repo. This means that the same version and container image is used for both components.
  • Add tooling to create a kind cluster for local development and testing.
  • Update go-gpuallocator dependency to migrate away from the deprecated gpu-monitoring-tools NVML bindings.
  • Remove legacyDaemonsetAPI config option. This was only required for k8s versions < 1.16.
  • Add support for MPS sharing.
  • Bump CUDA base image version to 12.3.1

Full Changelog: v0.14.0...v0.15.0-rc.1

v0.14.4

29 Jan 14:42
cde1a66
Compare
Choose a tag to compare

What's Changed

  • Update to refactored go-gpuallocator code. This permanently fixes the NVML_NVLINK_MAX_LINKS value addressed in a
    hotfix in v0.14.3. This also addresses a bug due to uninitialized NVML when calling go-gpuallocator.

Full Changelog: v0.14.3...v0.14.4

v0.14.3

15 Nov 13:09
Compare
Choose a tag to compare

Bug fixes

  • Patched vendored NVML_NVLINK_MAX_LINKS to 18 to support devices with 18 NVLinks

Dependency updates

  • Bumped CUDA base images version to 12.3.0

Full Changelog: v0.14.2...v0.14.3

v0.14.2

20 Oct 09:59
Compare
Choose a tag to compare

This release bumps dependencies.

Dependency Updates

  • Updated CUDA Base Image to 12.2.2
  • Updated GPU Feature Discovery version to v0.8.2

Full Changelog: v0.14.1...v0.14.2

v0.14.1

13 Jul 09:35
Compare
Choose a tag to compare

This release fixes bugs and bumps dependencies.

Bug fixes

  • Fixed parsing of deviceListStrategy in device plugin config (#410)

Dependency Updates

  • Updated CUDA Base Image to 12.2.0
  • Update GPU Feature Discovery version to v0.8.1
  • Update Node Feature Discovery to v0.13.2
  • Updated Go dependencies.

Full Changelog: v0.14.0...v0.14.1

v0.14.0

03 Apr 21:09
Compare
Choose a tag to compare

Full Changelog: v0.13.0...v0.14.0

Changes

  • Promote v0.14.0-rc.3 to v0.14.0
  • Bumped nvidia-container-toolkit dependency to latest version for newer CDI spec generation code
  • Updated GFD subchart to version v0.8.0

Changes from v0.14.0-rc.3

  • Removed the --cdi-enabled config option and instead trigger CDI injection based on cdi-annotation strategy.
  • Bumped go-nvlib dependency to latest version to support new MIG profiles.
  • Added cdi-annotation-prefix config option to control how CDI annotations are generated.
  • Renamed driver-root-ctr-path config option added in v0.14.0-rc.1 to container-driver-root.
  • Updated GFD subchart to version v0.8.0-rc.2

Changes from v0.14.0-rc.2

  • Fix bug from v0.14.0-rc.1 when using cdi-enabled=false

Changes from v0.14.0-rc.1

  • Added --cdi-enabled flag to GPU Device Plugin. With this enabled, the device plugin will generate CDI specifications for available NVIDIA devices. Allocation will add CDI anntiations (cdi.k8s.io/*) to the response. These are read by a CDI-enabled runtime to make the required modifications to a container being created.
  • Updated GFD subchard to version 0.8.0-rc.1
  • Bumped Golang version to 1.20.1
  • Bumped CUDA base images version to 12.1.0
  • Switched to klog for logging
  • Added a static deployment file for Microshift

Note:

The container image nvcr.io/nvidia/k8s-device-plugin-v0.14.0-ubi8 contains the following high-severity CVEs:

  • CVE-2023-0286 - Vulnerability found in os package type (rpm) - openssl-libs
  • CVE-2023-24329 - Vulnerability found in os package type (rpm) - platform-python and python3-libs

v0.14.0-rc.3

29 Mar 12:56
Compare
Choose a tag to compare
v0.14.0-rc.3 Pre-release
Pre-release

Full Changelog: v0.14.0-rc.2...v0.14.0-rc.3

Changes

  • Removed the --cdi-enabled config option and instead trigger CDI injection based on cdi-annotation strategy.
  • Bumped go-nvlib dependency to latest version to support new MIG profiles.
  • Added cdi-annotation-prefix config option to control how CDI annotations are generated.
  • Renamed driver-root-ctr-path config option added in v0.14.0-rc.1 to container-driver-root.
  • Updated GFD subchart to version v0.8.0-rc.2

v0.14.0-rc.2

20 Mar 23:02
Compare
Choose a tag to compare
v0.14.0-rc.2 Pre-release
Pre-release

Full Changelog: v0.14.0-rc.1...v0.14.0-rc.2

Changes

  • Fix bug from v0.14.0-rc.1 when using cdi-enabled=false

v0.14.0-rc.1

17 Mar 07:22
Compare
Choose a tag to compare
v0.14.0-rc.1 Pre-release
Pre-release

Full Changelog: v0.13.0...v0.14.0-rc.1

Changes

  • Added --cdi-enabled flag to GPU Device Plugin. With this enabled, the device plugin will generate CDI specifications for available NVIDIA devices. Allocation will add CDI anntiations (cdi.k8s.io/*) to the response. These are read by a CDI-enabled runtime to make the required modifications to a container being created.
  • Updated GFD subchard to version 0.8.0-rc.1
  • Bumped Golang version to 1.20.1
  • Bumped CUDA base images version to 12.1.0
  • Switched to klog for logging
  • Added a static deployment file for Microshift

v0.13.0

30 Nov 14:21
Compare
Choose a tag to compare

Full Changelog: v0.12.2...v0.13.0

Changes

  • Skip NVIDIA DGX Display devices when generating labels.
  • Fail on startup if no valid resources are detected
  • Bump GFD subchart to version 0.7.0

Changes from v0.13.0-rc.3

  • Use nodeAffinity instead of nodeSelector by default in daemonsets
  • Add machine-file-path option to GFD config flags
  • Mount /sys instead of /sys/class/dmi/id/product_name in GPU Feature Discovery daemonset
  • Bump GFD subchard to version 0.7.0-rc.3

Changes from v0.13.0-rc.2

  • Bump cuda base image to 11.8.0
  • Use consistent indendation in YAML manifests
  • Fix bug from v0.13.0-rc.1 when using mig-strategy="mixed"
  • Add logged error message if setting up health checks fails
  • Support MIG devices with 1g.10gb+me profile
  • Distribute replicas evenly across GPUs during allocation
  • Bump GFD subchart to version 0.7.0-rc.2

Changes from v0.13.0-rc.1

  • Improve health checks to detect errors when waiting on device events
  • Log ECC error events detected during health check
  • Add the GIT sha to version information for the CLI and container images
  • Use NVML interfaces from go-nvlib to query devices
  • Refactor plugin creation from resources
  • Add a CUDA-based resource manager that can be used to expose integrated devices on Tegra-based systems
  • Bump GFD subchart to version 0.7.0-rc.1

Note:

The container image nvcr.io/nvidia/k8s-device-plugin:v0.13.0-ubi8 contains the following high-severity CVEs:

  • CVE-2022-42898 - Vulnerability found in os package type (rpm) - krb5-libs