Releases: NVIDIA/ais-k8s
Releases · NVIDIA/ais-k8s
v2.1.2
Additional changes
- Fixed a bug where If a new operator was deployed with an old AIS cluster and never did a new statefulset creation, PVCs would not be properly cleaned up
- Updated AIS test versions
v2.1.1
NOTE: If your deployment includes a GCP secret name but did NOT define the environment variable GOOGLE_APPLICATION_CREDENTIALS
, this update will cause a target rollout as the default will be added.
Added/Changed
-
Usage
- Enable automatic restarts when specific config changes (via restart-hash annotation)
- Allow updating pod security context
- Update config options to latest from AIS, including rate limit options
- Add default resource request for ephemeral storage volumes for AIS
-
Optimizations
- Removed kube-rbac-proxy container, expose metrics directly
- Disable client pod cache to reduce memory usage
- Use patch when updating pod specs in statefulsets
- Update go version to 1.24 and updated all dependencies
- Optimize rollout settings for proxy statefulset
- Check proxy service endpoints instead of operator host DNS resolution
Full Changelog: v2.0.1...v2.1.1
v2.0.1
- Fixed a bug where certain annotations could cause an infinite loop in reconciliation
v2.0.0
AIS Operator v2.0
THIS UPDATE WILL CAUSE A RESTART OF AIS CLUSTERS DEPLOYED WITH OPERATOR <v2.0
Added/Changed
- Environment variables and annotations provided in spec will now sync to AIS pods
- Fixed a bug where AIS global rebalance would not properly disable before target upgrades
- Changed the default discovery URL in config to use the proxy headless service instead of always using proxy-0
- Added
logSidecarImage
spec option to provide control over included sidecar.- By default we suggest aistorage/ais-logs:v1.0. This reads INFO logs from the AIS daemon's file output and redirects to stdout for k8s to read. If left empty in spec, no sidecar will be included.
Removed
- Removed default config generation based on image tag, now managed by init container only
- Removed deprecated spec options
- EnablePromExporter
- DisablePodAntiAffinity
- TargetSpec.AllowSharedOrNoDisks
Full Changelog: v1.7.0...v2.0.0
v1.7.0
AIS Operator v1.7.0
- Fixed bug with shutdown that could cause a cluster to be stuck in "Shutting Down" state. Operator no longer makes a separate API call to specifically shut down AIS cluster before scaling down.
- Optimize rebalance condition to patch only when changed
- Removed several unused environment variables from the statefulset spec. Refactored construction of the set of ENV vars to use.
- Minor updates to tests, linting, proxy statefulset update status
- Updated all minor dependencies including AIS
Deprecated
EnablePromExporter
option. On all recent AIS releases this is always enabled and the associated environment variable has been removed.
Helm
- Updated CA duration and renewal option in TLS charts
- Added cloud cert secrets generation chart
- Added config for internal test cluster and internal deployment
- Added pod
resource
values option
Full Changelog: v1.6.1...v1.7.0
v1.6.1
See https://github.com/NVIDIA/ais-k8s/releases/tag/v1.6.0
AIS Operator v1.6.1
- Added reconciliation of target and proxy container resources spec
Full Changelog: v1.6.0...v1.6.1
v1.6.0
IMPORTANT Please see compatibility docs for information on deploying clusters with this new version. It requires a new aisinit container >= v3.25 to generate configs for AIS pods.
AIS Operator v1.6.0
- Added support for init container managed configs. See compatibility docs. This will improve compatibility between versions and help with upgrade paths.
- Operator will now reconcile the entire pod spec for aisnode when image changes
- Operator will now reconcile the entire init pod spec when init image changes
- Added resource management options to AIS spec
- Added MY_NODE env var to aisnode container
- Added support for deployments with distributed tracing
Full Changelog: v1.5.0...v1.6.0
v1.5.0
AIS Operator v1.5.0
- Updated to go 1.23 and latest dependencies
- Added support for custom annotations passed from spec to aisnode containers via
Annotations
spec option - Added support for custom environment variables passed from spec to aisnode containers via
Env
spec option - Fixed a bug where rebalance would not properly disable and re-enable for upgrades if it had been modified manually
- Removed the option for the operator manager to run external to the k8s cluster
- Internal logic refactoring of AIS API and AuthN clients
- Added
Sync
option to version config - Changed
net.http.UseHttps
option to solely control whether aisnode expects to use HTTPS rather than relying on presence of TLS secrets or cert manager issuer - Improved logging and requeue logic to make it easier to follow deployment progress and debug issues
Helm
- Moved operator repository to github pages. The operator will now use a constant repo and update chart versions along with each new version. See https://github.com/NVIDIA/ais-k8s/tree/main/helm#install-charts for instructions.
Full Changelog: v1.4.1...v1.5.0
v1.4.1
AIS Operator v1.4.1
- Fixed an issue where the operator would modify the rebalance config in the provided spec and not restore previous config after upgrades
- Cleaned up logging and handling of DNS resolution on proxy startup
Major release v1.4.0: https://github.com/NVIDIA/ais-k8s/releases/tag/v1.4.0
Full Changelog: v1.4.0...v1.4.1
v1.4.0
AIS Operator v1.4.0
- Improved state management to reconcile based on state rather than using blocking waits
- Disabled rebalance at the AIS level before cluster modifications -- scaling, rolling upgrades, cluster re-creation
- Added a watch on AIS spec configToUpdate for changes and keep those in sync with the cluster
- Added ability to reconcile statefulset status
- Updated default AIS config generation and improved compatibility through version changes
- Added new AIS states for the following:
- Scaling
- HostCleanup
- Finalized
- Bug fixes
- Fixed deep equal comparison with spec
- Fixed cleanup jobs with proper status and termination
- Improved wait behavior when waiting for AIS cluster readiness or decommissioning
- QOL improvements -- Cleaned up logging, Added unit testing
API Changes
- New options
- cleanupMetadata -- Allows for cluster decommission while preserving cluster metadata for future deployments
- tlsCertManagerIssuerName -- Specifies a cert-manager CSI issuer
Full Changelog: v1.3.0...v1.4.0