@@ -208,8 +208,8 @@ Beta (v1.22):
208208- Enable LogarithmicScaleDown feature gate by default
209209- Enable ` sorting_deletion_age_ratio ` metric
210210
211- Stable (v1.23 ):
212- - Remove LogarithmicScaleDown feature gate
211+ Stable (v1.31 ):
212+ - Lock LogarithmicScaleDown feature gate to true
213213- Make this behavior standard
214214
215215### Upgrade / Downgrade Strategy
@@ -230,9 +230,7 @@ behavior reduces the risk that it is an expectation from other components.
230230
231231### Feature Enablement and Rollback
232232
233- _ This section must be completed when targeting alpha to a release._
234-
235- * ** How can this feature be enabled / disabled in a live cluster?**
233+ ###### How can this feature be enabled / disabled in a live cluster?
236234 - [x] Feature gate (also fill in values in ` kep.yaml ` )
237235 - Feature gate name: LogarithmicScaleDown
238236 - Components depending on the feature gate: kube-controller-manager
@@ -243,53 +241,58 @@ _This section must be completed when targeting alpha to a release._
243241 - Will enabling / disabling the feature require downtime or reprovisioning
244242 of a node?
245243
246- * ** Does enabling the feature change any default behavior?**
244+ ###### Does enabling the feature change any default behavior?
247245 Yes, this changes the default assumption that the youngest pod in a replica set
248246 will always be the one evicted. However, it still groups pods by their age and picks
249247 from the youngest group.
250248
251- * ** Can the feature be disabled once it has been enabled (i.e. can we roll back
252- the enablement)?**
249+ ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
253250 Yes. Existing workloads should see no change when disabling this feature.
254251
255- * ** What happens if we reenable the feature if it was previously rolled back?**
252+ ###### What happens if we reenable the feature if it was previously rolled back?
256253 Assumptions that the newest pod will be deleted first may break.
257254
258- * ** Are there any tests for feature enablement/disablement?**
255+ ###### Are there any tests for feature enablement/disablement?
259256 Tests for feature disablement shouldn't be necessary, as this is already an assumed
260257 (but not documented) controller behavior.
261258
262259### Rollout, Upgrade and Rollback Planning
263260
264- _ This section must be completed when targeting beta graduation to a release._
265-
266- * ** How can a rollout fail? Can it impact already running workloads?**
261+ ###### How can a rollout or rollback fail? Can it impact already running workloads?
267262 This should not affect running workloads, though there is the possibility that the logic
268263 panics which would cause kube-controller-manager to crash
269264
270- * ** What specific metrics should inform a rollback?**
265+ ###### What specific metrics should inform a rollback?
271266 Increased pod deletions could indicate runaway/hot-loop failures in the scaledown logic.
272267 Availability of applications may also be affected. Though the intent of this is to provide
273268 better available through more distributed victim selection, in cases of desired binpacking
274269 pods may remain running on undesired nodes.
275270
276- * ** Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
277- This will be manually tested before the graduation to beta
271+ ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
272+ This is purely in-memory change for the controller, so upgrade/downgrade doesn't really change anything.
278273
279- * ** Is the rollout accompanied by any deprecations and/or removals of features, APIs,
280- fields of API types, flags, etc.?**
274+ ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
281275 No
282276
283277### Monitoring Requirements
284278
285- _ This section must be completed when targeting beta graduation to a release._
286-
287- * ** How can an operator determine if the feature is in use by workloads?**
288- The scaledown behavior of all replicasets will be affected by this featuregate being
289- enabled, so somehow monitoring them will be necessary to determine it
290-
291- * ** What are the SLIs (Service Level Indicators) an operator can use to determine
292- the health of the service?**
279+ ###### How can an operator determine if the feature is in use by workloads?
280+ The feature is global, so it's always going to be used on any downscale.
281+
282+ ###### How can someone using this feature know that it is working for their instance?
283+ - [ ] Events
284+ - Event Reason:
285+ - [ ] API .status
286+ - Condition name:
287+ - Other field:
288+ - [x] Other (treat as last resort)
289+ - Details:
290+ A ReplicaSet with two ready pods whose Pod Cost annotation is not set,
291+ if the logarithmic values of the pod ready times are identical,
292+ the pod with the smaller UID will be downscaled first rather than
293+ the latest ready one
294+
295+ ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
293296 - [x] Metrics
294297 - Metric name: sorting_deletion_age_ratio
295298 - [ Optional] Aggregation method:
@@ -302,71 +305,52 @@ algorithm falls back to age. (Pod age is the final criteria in the sorting algor
302305want to measure this ratio for deletions which don't use this feature, as those may validly fall
303306outside the desired range).
304307
305- * ** What are the reasonable SLOs (Service Level Objectives) for the above SLIs? **
308+ ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
306309 There should be no values ` >2 ` in the above metric when the Pod Cost annotation is unset
307310 (see https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/2255-pod-cost ) and
308311 the pod's deletion was based on a timestamp comparison (rather than, for example, pod state).
309312
310- * ** Are there any missing metrics that would be useful to have to improve observability
311- of this feature?**
312- Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
313- implementation difficulties, etc.).
313+ ###### Are there any missing metrics that would be useful to have to improve observability of this feature?
314+ No, we didn't find any other gaps that could be covered by metrics.
314315
315316### Dependencies
316317
317- _ This section must be completed when targeting beta graduation to a release._
318-
319- * ** Does this feature depend on any specific services running in the cluster?**
318+ ###### Does this feature depend on any specific services running in the cluster?
320319 No, it is part of the controller-manager
321320
322321### Scalability
323322
324- _ For alpha, this section is encouraged: reviewers should consider these questions
325- and attempt to answer them._
326-
327- _ For beta, this section is required: reviewers must answer these questions._
328-
329- _ For GA, this section is required: approvers should be able to confirm the
330- previous answers based on experience in the field._
331-
332- * ** Will enabling / using this feature result in any new API calls?**
323+ ###### Will enabling / using this feature result in any new API calls?
333324 No
334325
335- * ** Will enabling / using this feature result in introducing new API types?**
326+ ###### Will enabling / using this feature result in introducing new API types?
336327 No
337328
338- * ** Will enabling / using this feature result in any new calls to the cloud
339- provider?**
329+ ###### Will enabling / using this feature result in any new calls to the cloud provider?
340330 No
341331
342- * ** Will enabling / using this feature result in increasing size or count of
343- the existing API objects?**
332+ ###### Will enabling / using this feature result in increasing size or count of the existing API objects?
344333 No
345334
346- * ** Will enabling / using this feature result in increasing time taken by any
347- operations covered by [ existing SLIs/SLOs] ?**
335+ ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
348336 No
349337
350- * ** Will enabling / using this feature result in non-negligible increase of
351- resource usage (CPU, RAM, disk, IO, ...) in any components?**
338+ ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
352339 No, perhaps minimal increase in calculating the buckets for pod age
353340
354- ### Troubleshooting
355-
356- The Troubleshooting section currently serves the ` Playbook ` role. We may consider
357- splitting it into a dedicated ` Playbook ` document (potentially with some monitoring
358- details). For now, we leave it here.
341+ ###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
342+ No
359343
360- _ This section must be completed when targeting beta graduation to a release. _
344+ ### Troubleshooting
361345
362- * ** How does this feature react if the API server and/or etcd is unavailable?**
346+ ###### How does this feature react if the API server and/or etcd is unavailable?
363347 N/a - this is not a feature of running workloads. The main controller will not work and
364348 be unable to scale up or down if API or etcd are unavailable.
365349
366- * ** What are other known failure modes?**
350+ ###### What are other known failure modes?
367351n/a
368352
369- * ** What steps should be taken if SLOs are not being met to determine the problem?**
353+ ###### What steps should be taken if SLOs are not being met to determine the problem?
370354n/a
371355
372356[ supported limits ] : https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
376360
377361- 2021-01-06: Initial KEP submitted
378362- 2021-05-07: Updated KEP for graduation to beta
363+ - 2024-05-21:Updated KEP for graduation to GA
379364
380365## Drawbacks
381366
0 commit comments