From ed48970de70d2667661ed8f4f03b2a1419c752fb Mon Sep 17 00:00:00 2001
From: thiyyakat <meghana.thiyyakat@gmail.com>
Date: Mon, 22 Sep 2025 15:15:10 +0530
Subject: [PATCH 01/11] Add proposal for preservation of failed machines

---
 docs/proposals/failed-machine-preservation.md | 103 ++++++++++++++++++
 1 file changed, 103 insertions(+)
 create mode 100644 docs/proposals/failed-machine-preservation.md

diff --git a/docs/proposals/failed-machine-preservation.md b/docs/proposals/failed-machine-preservation.md
new file mode 100644
index 000000000..684e59826
--- /dev/null
+++ b/docs/proposals/failed-machine-preservation.md
@@ -0,0 +1,103 @@
+# Preservation of Failed Machines
+
+<!-- TOC -->
+
+- [Preservation of Failed Machines](#preservation-of-failed-machines)
+    - [Objective](#objective)
+    - [Solution Design](#solution-design)
+    - [State Machine](#state-machine)
+    - [Use Cases](#use-cases)
+        
+
+<!-- /TOC -->
+
+## Objective
+
+Currently, the Machine Controller Manager(MCM) moves Machines with errors to the `Unknown` phase, and after the configured `machineHealthTimeout` seconds, to the `Failed` phase.
+`Failed` machines are swiftly moved to the `Terminating` phase during which the node is drained and the `Machine` object is deleted. This rapid cleanup prevents SRE/operators/support from conducting an analysis on the VM and makes finding root cause of failure more difficult.
+
+This document proposes enhancing MCM, such that:
+* VMs of `Failed` machines are retained temporarily for analysis
+* There is a configurable limit to the number of `Failed` machines that can be preserved
+* There is a configurable limit to the duration for which such machines are preserved
+* Users can specify which healthy machines they would like to preserve in case of failure 
+* Users can request MCM to delete a preserved `Failed` machine, even before the timeout expires
+
+## Solution Design
+
+In order to achieve the objectives mentioned, the following are proposed:
+1. Enhance `machineControllerManager` configuration in the `ShootSpec`, to specify the max number of failed machines to be preserved,
+and the time duration for which these machines will be preserved.
+    ```
+    machineControllerManager:
+       failedMachinePreserveMax: 2
+       failedMachinePreserveTimeout: 3h
+    ```
+    * Since gardener worker pool can correspond to `1..N` MachineDeployments depending on number of zones, `failedMachinePreserveMax` will be distributed across N machine deployments.
+    * `failedMachinePreserveMax` must be chosen such that it can be appropriately distributed across the MachineDeployments.
+2. Allow user/operator to explicitly request for preservation of a machine if it moves to `Failed` phase with the use of an annotation : `node.machine.sapcloud.io/preserve-when-failed=true`.
+When such an annotated machine transitions from `Unknown` to `Failed`, it is prevented from moving to `Terminating` phase until  `failedMachinePreserveTimeout` expires. 
+   * A user/operator can request MCM to stop preserving a preserved `Failed` machine by adding/modifying the annotation: `node.machine.sapcloud.io/preserve-when-failed=false`. 
+   * For a machine thus annotated, MCM will move it to `Terminating` phase even if `failedMachinePreserveTimeout` has not expired.
+3. If an un-annotated machine moves to `Failed` phase, and the `failedMachinePreserveMax` has not been reached, MCM will auto-preserve this machine.
+4. MCM will be modified to introduce a new stage in the `Failed` phase: `machineutils.PreserveFailed`, and a failed machine that is preserved by MCM will be transitioned to this stage after moving to `Failed`. 
+   * In this new stage, pods can be evicted and scheduled on other healthy machines, and the user/operator can wait for the corresponding VM to potentially recover. If the machine moves to `Running` phase on recovery, new pods can be scheduled on it. It is yet to be determined whether this feature will be required.
+
+
+## State Machine
+
+The behaviour described above can be summarised using the state machine below:
+
+```
+(Running Machine)
+├── [User adds `node.machine.sapcloud.io/preserve-when-failed=true`] → (Running + Requested)
+└── [Machine fails + capacity available] → (PreserveFailed)
+
+(Running + Requested)
+├── [Machine fails + capacity available] → (PreserveFailed)
+├── [Machine fails + no capacity] → Failed → Terminating 
+└── [User removes `node.machine.sapcloud.io/preserve-when-failed=true`] → (Running)
+
+(PreserveFailed)
+├── [User adds `node.machine.sapcloud.io/preserve-when-failed=false`] → Terminating
+└── [failedMachinePreserveTimeout expires] → Terminating
+
+```
+In the above state machine, the phase `Running` also includes machines that are in the process of creation for which no errors have been encountered yet.
+The transition of moving a machine from `PreserveFailed` to `Running` has not been shown since we haven't determined whether it is in scope for the current iteration of this feature.
+
+## Use Cases:
+
+### Use Case 1: Proactive Preservation Request
+**Scenario:** Operator suspects a machine might fail and wants to ensure preservation for analysis.
+#### Steps:
+1. Operator annotates node with `node.machine.sapcloud.io/preserve-when-failed=true`
+2. Machine fails later
+3. MCM preserves the machine (if capacity allows)
+4. Operator analyzes the failed VM
+5. Operator releases the failed machine by setting `node.machine.sapcloud.io/preserve-when-failed=false` on the node object
+
+### Use Case 2: Automatic Preservation
+**Scenario:** Machine fails unexpectedly, no prior annotation.
+#### Steps:
+1. Machine transitions to Failed state
+2. MCM checks preservation capacity
+3. If capacity available, machine moved to `PreserveFailed` phase by MCM
+4. After timeout, machine is terminated by MCM
+
+### Use Case 3: Capacity Management
+**Scenario:** Multiple machines fail when preservation capacity is full.
+#### Steps:
+1. Machines M1, M2 already preserved (capacity = 2)
+2. Machine M3 fails with annotation `node.machine.sapcloud.io/preserve-when-failed=true` set
+3. MCM cannot preserve M3 due to capacity limits
+4. M3 moved from `Failed` to `Terminating` by MCM, following which it is deleted
+
+### Use Case 4: Early Release
+**Scenario:** Operator has performed his analysis and no longer requires machine to be preserved
+
+#### Steps:
+1. Machine M1 is in `PreserveFailed` phase
+2. Operator adds: `node.machine.sapcloud.io/preserve-when-failed=false` to node.
+3. MCM transitions M1 to `Terminating`
+4. Capacity becomes available for preserving future `Failed` machines.

From 50e6fd15f3a0742ad45054c8e94165c3a84e8a4b Mon Sep 17 00:00:00 2001
From: thiyyakat <meghana.thiyyakat@sap.com>
Date: Tue, 23 Sep 2025 09:22:55 +0530
Subject: [PATCH 02/11] Add limitations

---
 docs/proposals/failed-machine-preservation.md | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/docs/proposals/failed-machine-preservation.md b/docs/proposals/failed-machine-preservation.md
index 684e59826..5f1798439 100644
--- a/docs/proposals/failed-machine-preservation.md
+++ b/docs/proposals/failed-machine-preservation.md
@@ -42,6 +42,7 @@ When such an annotated machine transitions from `Unknown` to `Failed`, it is pre
 3. If an un-annotated machine moves to `Failed` phase, and the `failedMachinePreserveMax` has not been reached, MCM will auto-preserve this machine.
 4. MCM will be modified to introduce a new stage in the `Failed` phase: `machineutils.PreserveFailed`, and a failed machine that is preserved by MCM will be transitioned to this stage after moving to `Failed`. 
    * In this new stage, pods can be evicted and scheduled on other healthy machines, and the user/operator can wait for the corresponding VM to potentially recover. If the machine moves to `Running` phase on recovery, new pods can be scheduled on it. It is yet to be determined whether this feature will be required.
+5. Machines of a MachineDeployment in `PreserveFailed` stage will also be counted towards the replica count and the enforcement of maximum machines allowed for the MachineDeployment.
 
 
 ## State Machine
@@ -101,3 +102,9 @@ The transition of moving a machine from `PreserveFailed` to `Running` has not be
 2. Operator adds: `node.machine.sapcloud.io/preserve-when-failed=false` to node.
 3. MCM transitions M1 to `Terminating`
 4. Capacity becomes available for preserving future `Failed` machines.
+
+## Limitations
+
+1. During rolling updates we will NOT honor preserving Machines. The Machine will be replaced with a healthy one if it moves to Failed phase.
+2. Since gardener worker pool can correspond to 1..N MachineDeployments depending on number of zones, we will need to distribute the `failedMachinePreserveMax` across N machine deployments.
+So, even if there are no failed machines preserved in other zones, the max per zone would still be enforced. Hence, the value of `failedMachinePreserveMax` should be chosen appropriately. 

From 454422c7dc8f0f433114e29c191b748a1aaf68ea Mon Sep 17 00:00:00 2001
From: thiyyakat <meghana.thiyyakat@sap.com>
Date: Tue, 23 Sep 2025 16:02:38 +0530
Subject: [PATCH 03/11] Address review comments

---
 docs/proposals/failed-machine-preservation.md | 85 ++++++++++---------
 1 file changed, 47 insertions(+), 38 deletions(-)

diff --git a/docs/proposals/failed-machine-preservation.md b/docs/proposals/failed-machine-preservation.md
index 5f1798439..30e6788dd 100644
--- a/docs/proposals/failed-machine-preservation.md
+++ b/docs/proposals/failed-machine-preservation.md
@@ -4,16 +4,15 @@
 
 - [Preservation of Failed Machines](#preservation-of-failed-machines)
     - [Objective](#objective)
-    - [Solution Design](#solution-design)
+    - [Proposal](#proposal)
     - [State Machine](#state-machine)
     - [Use Cases](#use-cases)
-        
 
 <!-- /TOC -->
 
 ## Objective
 
-Currently, the Machine Controller Manager(MCM) moves Machines with errors to the `Unknown` phase, and after the configured `machineHealthTimeout` seconds, to the `Failed` phase.
+Currently, the Machine Controller Manager(MCM) moves Machines with errors to the `Unknown` phase, and after the configured `machineHealthTimeout`, to the `Failed` phase.
 `Failed` machines are swiftly moved to the `Terminating` phase during which the node is drained and the `Machine` object is deleted. This rapid cleanup prevents SRE/operators/support from conducting an analysis on the VM and makes finding root cause of failure more difficult.
 
 This document proposes enhancing MCM, such that:
@@ -21,9 +20,9 @@ This document proposes enhancing MCM, such that:
 * There is a configurable limit to the number of `Failed` machines that can be preserved
 * There is a configurable limit to the duration for which such machines are preserved
 * Users can specify which healthy machines they would like to preserve in case of failure 
-* Users can request MCM to delete a preserved `Failed` machine, even before the timeout expires
+* Users can request MCM to release a preserved `Failed` machine, even before the timeout expires, so that MCM can transition the machine to `Terminating` phase and trigger its deletion.
 
-## Solution Design
+## Proposal
 
 In order to achieve the objectives mentioned, the following are proposed:
 1. Enhance `machineControllerManager` configuration in the `ShootSpec`, to specify the max number of failed machines to be preserved,
@@ -35,64 +34,70 @@ and the time duration for which these machines will be preserved.
     ```
     * Since gardener worker pool can correspond to `1..N` MachineDeployments depending on number of zones, `failedMachinePreserveMax` will be distributed across N machine deployments.
     * `failedMachinePreserveMax` must be chosen such that it can be appropriately distributed across the MachineDeployments.
-2. Allow user/operator to explicitly request for preservation of a machine if it moves to `Failed` phase with the use of an annotation : `node.machine.sapcloud.io/preserve-when-failed=true`.
-When such an annotated machine transitions from `Unknown` to `Failed`, it is prevented from moving to `Terminating` phase until  `failedMachinePreserveTimeout` expires. 
-   * A user/operator can request MCM to stop preserving a preserved `Failed` machine by adding/modifying the annotation: `node.machine.sapcloud.io/preserve-when-failed=false`. 
+2. Allow user/operator to explicitly request for preservation of a specific machine with the use of an annotation : `node.machine.sapcloud.io/preserve-when-failed=true`, such that, if it moves to `Failed` phase, the machine is preserved by MCM, provided there is capacity.
+3. MCM will be modified to introduce a new stage in the `Failed` phase: `machineutils.PreserveFailed`, and a failed machine that is preserved by MCM will be transitioned to this stage after moving to `Failed`.
+4. A machine in `PreserveFailed` stage automatically moves to `Terminating` phase once `failedMachinePreserveTimeout` expires. 
+   * A user/operator can request MCM to stop preserving a machine in `PreservedFailed` stage using the annotation: `node.machine.sapcloud.io/preserve-when-failed=false`. 
    * For a machine thus annotated, MCM will move it to `Terminating` phase even if `failedMachinePreserveTimeout` has not expired.
-3. If an un-annotated machine moves to `Failed` phase, and the `failedMachinePreserveMax` has not been reached, MCM will auto-preserve this machine.
-4. MCM will be modified to introduce a new stage in the `Failed` phase: `machineutils.PreserveFailed`, and a failed machine that is preserved by MCM will be transitioned to this stage after moving to `Failed`. 
-   * In this new stage, pods can be evicted and scheduled on other healthy machines, and the user/operator can wait for the corresponding VM to potentially recover. If the machine moves to `Running` phase on recovery, new pods can be scheduled on it. It is yet to be determined whether this feature will be required.
-5. Machines of a MachineDeployment in `PreserveFailed` stage will also be counted towards the replica count and the enforcement of maximum machines allowed for the MachineDeployment.
+5. If an un-annotated machine moves to `Failed` phase, and the `failedMachinePreserveMax` has not been reached, MCM will auto-preserve this machine.
+6. Machines of a MachineDeployment in `PreserveFailed` stage will also be counted towards the replica count and the enforcement of maximum machines allowed for the MachineDeployment.
+7. At any point in time `machines requested for preservation + machines in PreservedFailed <= failedMachinePreserveMax`. If  `machines requested for preservation + machines in PreservedFailed` is at or exceeds `failedMachinePreserveMax` on annotating a machine, the annotation will be deleted by MCM. 
 
 
 ## State Machine
 
 The behaviour described above can be summarised using the state machine below:
+```mermaid
+---
+config:
+  layout: elk
+---
+stateDiagram
+  direction TBP
+  state "PreserveFailed 
+  (node drained)" as PreserveFailed
+  state "Requested 
+  (node & machine annotated)" 
+   as Requested
+  [*] --> Running
+  Running --> Requested:annotated with value=true && max not breached
+  Running --> Running:annotated, but max breached
+  Requested --> PreserveFailed:on failure
+  Running --> PreserveFailed:on failure && max not breached
+  PreserveFailed --> Terminating:after timeout
+  PreserveFailed --> Terminating:annotated with value=false
+  Running --> Failed : on failure && max breached
+  PreserveFailed --> Running : VM recovers
+  Failed --> Terminating
+  Terminating --> [*]
 
 ```
-(Running Machine)
-├── [User adds `node.machine.sapcloud.io/preserve-when-failed=true`] → (Running + Requested)
-└── [Machine fails + capacity available] → (PreserveFailed)
 
-(Running + Requested)
-├── [Machine fails + capacity available] → (PreserveFailed)
-├── [Machine fails + no capacity] → Failed → Terminating 
-└── [User removes `node.machine.sapcloud.io/preserve-when-failed=true`] → (Running)
-
-(PreserveFailed)
-├── [User adds `node.machine.sapcloud.io/preserve-when-failed=false`] → Terminating
-└── [failedMachinePreserveTimeout expires] → Terminating
-
-```
 In the above state machine, the phase `Running` also includes machines that are in the process of creation for which no errors have been encountered yet.
-The transition of moving a machine from `PreserveFailed` to `Running` has not been shown since we haven't determined whether it is in scope for the current iteration of this feature.
 
 ## Use Cases:
 
 ### Use Case 1: Proactive Preservation Request
 **Scenario:** Operator suspects a machine might fail and wants to ensure preservation for analysis.
 #### Steps:
-1. Operator annotates node with `node.machine.sapcloud.io/preserve-when-failed=true`
+1. Operator annotates node with `node.machine.sapcloud.io/preserve-when-failed=true`, provided `failedMachinePreserveMax` is not violated
 2. Machine fails later
-3. MCM preserves the machine (if capacity allows)
+3. MCM preserves the machine
 4. Operator analyzes the failed VM
-5. Operator releases the failed machine by setting `node.machine.sapcloud.io/preserve-when-failed=false` on the node object
 
 ### Use Case 2: Automatic Preservation
 **Scenario:** Machine fails unexpectedly, no prior annotation.
 #### Steps:
-1. Machine transitions to Failed state
-2. MCM checks preservation capacity
-3. If capacity available, machine moved to `PreserveFailed` phase by MCM
-4. After timeout, machine is terminated by MCM
+1. Machine transitions to `Failed` phase
+2. If `failedMachinePreserveMax` is not breached, machine moved to `PreserveFailed` phase by MCM
+3. After `failedMachinePreserveTimeout`, machine is terminated by MCM
 
 ### Use Case 3: Capacity Management
 **Scenario:** Multiple machines fail when preservation capacity is full.
 #### Steps:
-1. Machines M1, M2 already preserved (capacity = 2)
-2. Machine M3 fails with annotation `node.machine.sapcloud.io/preserve-when-failed=true` set
-3. MCM cannot preserve M3 due to capacity limits
-4. M3 moved from `Failed` to `Terminating` by MCM, following which it is deleted
+1. Machines M1, M2 already preserved (failedMachinePreserveMax = 2)
+2. Operator wishes to preserve M3 in case of failure. He increases `failedMachinePreserveMax` to 3, and annotates M3 with `node.machine.sapcloud.io/preserve-when-failed=true`.
+3. If M3 fails, machine moved to `PreserveFailed` phase by MCM.
 
 ### Use Case 4: Early Release
 **Scenario:** Operator has performed his analysis and no longer requires machine to be preserved
@@ -100,9 +105,13 @@ The transition of moving a machine from `PreserveFailed` to `Running` has not be
 #### Steps:
 1. Machine M1 is in `PreserveFailed` phase
 2. Operator adds: `node.machine.sapcloud.io/preserve-when-failed=false` to node.
-3. MCM transitions M1 to `Terminating`
+3. MCM transitions M1 to `Terminating` even though `failedMachinePreserveTimeout` has not expired
 4. Capacity becomes available for preserving future `Failed` machines.
 
+## Open Point
+
+How will MCM provide the user with the option to drain a node when it is in `PreserveFailed` stage?
+
 ## Limitations
 
 1. During rolling updates we will NOT honor preserving Machines. The Machine will be replaced with a healthy one if it moves to Failed phase.

From 961692a8b5ac30e2fafebf6adb48a58e8ebc10d5 Mon Sep 17 00:00:00 2001
From: thiyyakat <meghana.thiyyakat@sap.com>
Date: Tue, 23 Sep 2025 16:10:41 +0530
Subject: [PATCH 04/11] Change mermaid layout from elk to default for github
 support

---
 docs/proposals/failed-machine-preservation.md | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/docs/proposals/failed-machine-preservation.md b/docs/proposals/failed-machine-preservation.md
index 30e6788dd..e1dde5f67 100644
--- a/docs/proposals/failed-machine-preservation.md
+++ b/docs/proposals/failed-machine-preservation.md
@@ -48,10 +48,6 @@ and the time duration for which these machines will be preserved.
 
 The behaviour described above can be summarised using the state machine below:
 ```mermaid
----
-config:
-  layout: elk
----
 stateDiagram
   direction TBP
   state "PreserveFailed 

From fc1093484cfdecfa887a9894c0cfbd2142d85220 Mon Sep 17 00:00:00 2001
From: thiyyakat <meghana.thiyyakat@sap.com>
Date: Wed, 1 Oct 2025 15:35:55 +0530
Subject: [PATCH 05/11] Improve clarity

---
 docs/proposals/failed-machine-preservation.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/proposals/failed-machine-preservation.md b/docs/proposals/failed-machine-preservation.md
index e1dde5f67..7296bf23e 100644
--- a/docs/proposals/failed-machine-preservation.md
+++ b/docs/proposals/failed-machine-preservation.md
@@ -34,13 +34,13 @@ and the time duration for which these machines will be preserved.
     ```
     * Since gardener worker pool can correspond to `1..N` MachineDeployments depending on number of zones, `failedMachinePreserveMax` will be distributed across N machine deployments.
     * `failedMachinePreserveMax` must be chosen such that it can be appropriately distributed across the MachineDeployments.
-2. Allow user/operator to explicitly request for preservation of a specific machine with the use of an annotation : `node.machine.sapcloud.io/preserve-when-failed=true`, such that, if it moves to `Failed` phase, the machine is preserved by MCM, provided there is capacity.
+2. Allow user/operator to request for preservation of a specific machine with the use of an annotation : `node.machine.sapcloud.io/preserve-when-failed=true`, such that, if the machine moves to `Failed` phase, it is preserved by MCM, provided there is capacity.
 3. MCM will be modified to introduce a new stage in the `Failed` phase: `machineutils.PreserveFailed`, and a failed machine that is preserved by MCM will be transitioned to this stage after moving to `Failed`.
 4. A machine in `PreserveFailed` stage automatically moves to `Terminating` phase once `failedMachinePreserveTimeout` expires. 
    * A user/operator can request MCM to stop preserving a machine in `PreservedFailed` stage using the annotation: `node.machine.sapcloud.io/preserve-when-failed=false`. 
    * For a machine thus annotated, MCM will move it to `Terminating` phase even if `failedMachinePreserveTimeout` has not expired.
-5. If an un-annotated machine moves to `Failed` phase, and the `failedMachinePreserveMax` has not been reached, MCM will auto-preserve this machine.
-6. Machines of a MachineDeployment in `PreserveFailed` stage will also be counted towards the replica count and the enforcement of maximum machines allowed for the MachineDeployment.
+5. If an un-annotated machine moves to `Failed` phase, and the number of preserved failed machines is less than `failedMachinePreserveMax`, MCM will auto-preserve this machine.
+6. Machines of a MachineDeployment in `PreserveFailed` stage will also be counted towards the replica count and in the enforcement of maximum machines allowed for the MachineDeployment.
 7. At any point in time `machines requested for preservation + machines in PreservedFailed <= failedMachinePreserveMax`. If  `machines requested for preservation + machines in PreservedFailed` is at or exceeds `failedMachinePreserveMax` on annotating a machine, the annotation will be deleted by MCM. 
 
 

From 309527fd26d2df5d380b35f869310b82639aeca8 Mon Sep 17 00:00:00 2001
From: thiyyakat <meghana.thiyyakat@sap.com>
Date: Fri, 3 Oct 2025 14:51:44 +0530
Subject: [PATCH 06/11] Change proposal as per discussions

---
 docs/proposals/failed-machine-preservation.md | 113 ++++++++----------
 1 file changed, 51 insertions(+), 62 deletions(-)

diff --git a/docs/proposals/failed-machine-preservation.md b/docs/proposals/failed-machine-preservation.md
index 7296bf23e..990ad1657 100644
--- a/docs/proposals/failed-machine-preservation.md
+++ b/docs/proposals/failed-machine-preservation.md
@@ -5,7 +5,6 @@
 - [Preservation of Failed Machines](#preservation-of-failed-machines)
     - [Objective](#objective)
     - [Proposal](#proposal)
-    - [State Machine](#state-machine)
     - [Use Cases](#use-cases)
 
 <!-- /TOC -->
@@ -15,68 +14,62 @@
 Currently, the Machine Controller Manager(MCM) moves Machines with errors to the `Unknown` phase, and after the configured `machineHealthTimeout`, to the `Failed` phase.
 `Failed` machines are swiftly moved to the `Terminating` phase during which the node is drained and the `Machine` object is deleted. This rapid cleanup prevents SRE/operators/support from conducting an analysis on the VM and makes finding root cause of failure more difficult.
 
+Moreover, in cases where a node seems healthy but all the workload on it are facing issues, there is a need for operators to be able to cordon/drain the node and conduct their analysis without the cluster-autoscaler scaling down the node.
+
 This document proposes enhancing MCM, such that:
-* VMs of `Failed` machines are retained temporarily for analysis
-* There is a configurable limit to the number of `Failed` machines that can be preserved
+* VMs of machines are retained temporarily for analysis
+* There is a configurable limit to the number of machines that can be preserved
 * There is a configurable limit to the duration for which such machines are preserved
 * Users can specify which healthy machines they would like to preserve in case of failure 
-* Users can request MCM to release a preserved `Failed` machine, even before the timeout expires, so that MCM can transition the machine to `Terminating` phase and trigger its deletion.
+* Users can request MCM to release a preserved machine, even before the timeout expires, so that MCM can transition the machine to either `Running` or `Terminating` phase, as the case may be.
 
 ## Proposal
 
 In order to achieve the objectives mentioned, the following are proposed:
-1. Enhance `machineControllerManager` configuration in the `ShootSpec`, to specify the max number of failed machines to be preserved,
+1. Enhance `machineControllerManager` configuration in the `ShootSpec`, to specify the max number of machines to be preserved,
 and the time duration for which these machines will be preserved.
     ```
     machineControllerManager:
-       failedMachinePreserveMax: 2
-       failedMachinePreserveTimeout: 3h
+       machinePreserveMax: 1
+       machinePreserveTimeout: 72h
     ```
-    * Since gardener worker pool can correspond to `1..N` MachineDeployments depending on number of zones, `failedMachinePreserveMax` will be distributed across N machine deployments.
-    * `failedMachinePreserveMax` must be chosen such that it can be appropriately distributed across the MachineDeployments.
-2. Allow user/operator to request for preservation of a specific machine with the use of an annotation : `node.machine.sapcloud.io/preserve-when-failed=true`, such that, if the machine moves to `Failed` phase, it is preserved by MCM, provided there is capacity.
-3. MCM will be modified to introduce a new stage in the `Failed` phase: `machineutils.PreserveFailed`, and a failed machine that is preserved by MCM will be transitioned to this stage after moving to `Failed`.
-4. A machine in `PreserveFailed` stage automatically moves to `Terminating` phase once `failedMachinePreserveTimeout` expires. 
-   * A user/operator can request MCM to stop preserving a machine in `PreservedFailed` stage using the annotation: `node.machine.sapcloud.io/preserve-when-failed=false`. 
-   * For a machine thus annotated, MCM will move it to `Terminating` phase even if `failedMachinePreserveTimeout` has not expired.
-5. If an un-annotated machine moves to `Failed` phase, and the number of preserved failed machines is less than `failedMachinePreserveMax`, MCM will auto-preserve this machine.
-6. Machines of a MachineDeployment in `PreserveFailed` stage will also be counted towards the replica count and in the enforcement of maximum machines allowed for the MachineDeployment.
-7. At any point in time `machines requested for preservation + machines in PreservedFailed <= failedMachinePreserveMax`. If  `machines requested for preservation + machines in PreservedFailed` is at or exceeds `failedMachinePreserveMax` on annotating a machine, the annotation will be deleted by MCM. 
-
-
-## State Machine
-
-The behaviour described above can be summarised using the state machine below:
-```mermaid
-stateDiagram
-  direction TBP
-  state "PreserveFailed 
-  (node drained)" as PreserveFailed
-  state "Requested 
-  (node & machine annotated)" 
-   as Requested
-  [*] --> Running
-  Running --> Requested:annotated with value=true && max not breached
-  Running --> Running:annotated, but max breached
-  Requested --> PreserveFailed:on failure
-  Running --> PreserveFailed:on failure && max not breached
-  PreserveFailed --> Terminating:after timeout
-  PreserveFailed --> Terminating:annotated with value=false
-  Running --> Failed : on failure && max breached
-  PreserveFailed --> Running : VM recovers
-  Failed --> Terminating
-  Terminating --> [*]
-
-```
-
-In the above state machine, the phase `Running` also includes machines that are in the process of creation for which no errors have been encountered yet.
+    * This configuration will be set per worker pool.
+    * Since gardener worker pool can correspond to `1..N` MachineDeployments depending on number of zones, `machinePreserveMax` will be distributed across N machine deployments.
+    * `machinePreserveMax` must be chosen such that it can be appropriately distributed across the MachineDeployments.
+    * Example: if `machinePreserveMax` is set to 2, and the worker pool has 2 zones, then the maximum number of machines that will be preserved per zone is 1.
+2. MCM will be modified to include a new phase `Preserved` to indicate that the machine has been preserved by MCM.
+3. Allow user/operator to request for preservation of a specific machine/node with the use of annotations : `node.machine.sapcloud.io/preserve=now` and `node.machine.sapcloud.io/preserve=when-failed`.
+4. When annotation `node.machine.sapcloud.io/preserve=now` is added to a `Running` machine, the following will take place:
+   - `cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"` is added to the node to prevent CA from scaling it down.
+   - `machine.CurrentStatus.PreserveExpiryTime` is updated by MCM as $machine.CurrentStatus.PreserveExpiryTime = currentTime+machinePreserveTimeout$
+   - The machine stage is changed to `Preserved`
+   - After timeout, the `node.machine.sapcloud.io/preserve=now` and `cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"` are deleted, the machine phase is changed to `Running` and the CA may delete the node. The `machine.CurrentStatus.PreserveExpiryTime` is set to `nil`.
+   - Number of machines explicitly annotated will count towards enforcing `machinePreserveMax`. On breach, the annotation will be rejected.
+5. When annotation `node.machine.sapcloud.io/preserve=when-failed` is added to a `Running` machine and the machine goes to `Failed`, the following will take place:
+   - The machine phase is changed to `Preserved`.
+   - Pods (other than daemonset pods) are drained.
+   - `machine.CurrentStatus.PreserveExpiryTime` is updated by MCM as $machine.CurrentStatus.PreserveExpiryTime = currentTime+machinePreserveTimeout$
+   - After timeout, the `node.machine.sapcloud.io/preserve=when-failed` is deleted. The phase is changed to `Terminating`.
+   - Number of machines explicitly annotated will count towards enforcing `machinePreserveMax`. On breach, the annotation will be rejected.
+6. When an un-annotated machine goes to `Failed` phase and the $count(machinesAnnotatedForPreservation)+count(AutoPreservedMachines)<machinePreserveMax$
+   - The machine's phase is changed to `Preserved`.
+   - Pods (other than DaemonSet pods) are drained.
+   - `machine.CurrentStatus.PreserveExpiryTime` is updated by MCM as $machine.CurrentStatus.PreserveExpiryTime = currentTime+machinePreserveTimeout$
+   - After timeout, the phase is changed to `Terminating`.
+   - Number of machines in `Preserved` phase count towards enforcing `machinePreserveMax`.
+   - In the rest of the doc, the preservation of such un-annotated failed machines is referred to as **"auto-preservation"**.
+7. If a `Failed` machine is currently in `Preserved` and after timeout its VM/node is found to be Healthy, the machine will be moved to `Running`.
+8. A user/operator can request MCM to stop preserving a machine/node in `Preserved` stage using the annotation: `node.machine.sapcloud.io/preserve=false`. 
+   * For a machine thus annotated, MCM will move it either to `Running` phase or `Terminating` depending on the phase of the machine before it was moved to `Preserved`.
+9. Machines of a MachineDeployment in `Preserved` stage will also be counted towards the replica count and in the enforcement of maximum machines allowed for the MachineDeployment.
+10. At any point in time $count(machinesAnnotatedForPreservation)+count(PreservedMachines)<=machinePreserveMax$. 
 
 ## Use Cases:
 
 ### Use Case 1: Proactive Preservation Request
 **Scenario:** Operator suspects a machine might fail and wants to ensure preservation for analysis.
 #### Steps:
-1. Operator annotates node with `node.machine.sapcloud.io/preserve-when-failed=true`, provided `failedMachinePreserveMax` is not violated
+1. Operator annotates node with `node.machine.sapcloud.io/preserve=when-failed`, provided `machinePreserveMax` is not violated
 2. Machine fails later
 3. MCM preserves the machine
 4. Operator analyzes the failed VM
@@ -85,31 +78,27 @@ In the above state machine, the phase `Running` also includes machines that are
 **Scenario:** Machine fails unexpectedly, no prior annotation.
 #### Steps:
 1. Machine transitions to `Failed` phase
-2. If `failedMachinePreserveMax` is not breached, machine moved to `PreserveFailed` phase by MCM
-3. After `failedMachinePreserveTimeout`, machine is terminated by MCM
+2. If `machinePreserveMax` is not breached, machine moved to `Preserved` phase by MCM
+3. After `machinePreserveTimeout`, machine is terminated by MCM
 
-### Use Case 3: Capacity Management
-**Scenario:** Multiple machines fail when preservation capacity is full.
+### Use Case 3: Preservation Request for Analysing Running Machine
+**Scenario:** Workload on machine failing. Operator wishes to diagnose.
 #### Steps:
-1. Machines M1, M2 already preserved (failedMachinePreserveMax = 2)
-2. Operator wishes to preserve M3 in case of failure. He increases `failedMachinePreserveMax` to 3, and annotates M3 with `node.machine.sapcloud.io/preserve-when-failed=true`.
-3. If M3 fails, machine moved to `PreserveFailed` phase by MCM.
+1. Operator annotates node with `node.machine.sapcloud.io/preserve=now`, provided `machinePreserveMax` is not violated
+2. MCM preserves machine and prevents CA from scaling it down
+3. Operator analyzes the machine
 
 ### Use Case 4: Early Release
 **Scenario:** Operator has performed his analysis and no longer requires machine to be preserved
-
 #### Steps:
-1. Machine M1 is in `PreserveFailed` phase
-2. Operator adds: `node.machine.sapcloud.io/preserve-when-failed=false` to node.
-3. MCM transitions M1 to `Terminating` even though `failedMachinePreserveTimeout` has not expired
-4. Capacity becomes available for preserving future `Failed` machines.
-
-## Open Point
+1. Machine is in `Preserved` phase
+2. Operator adds: `node.machine.sapcloud.io/preserve=false` to node.
+3. MCM transitions machine to `Running` or `Terminating`, depending on which phase it was in before moving to `Preserved`, even though `machinePreserveTimeout` has not expired
+4. Capacity becomes available for preserving future annotated machines or for auto-preservation of `Failed` machines.
 
-How will MCM provide the user with the option to drain a node when it is in `PreserveFailed` stage?
 
 ## Limitations
 
 1. During rolling updates we will NOT honor preserving Machines. The Machine will be replaced with a healthy one if it moves to Failed phase.
-2. Since gardener worker pool can correspond to 1..N MachineDeployments depending on number of zones, we will need to distribute the `failedMachinePreserveMax` across N machine deployments.
+2. Since gardener worker pool can correspond to 1..N MachineDeployments depending on number of zones, we will need to distribute the `machinePreserveMax` across N machine deployments.
 So, even if there are no failed machines preserved in other zones, the max per zone would still be enforced. Hence, the value of `failedMachinePreserveMax` should be chosen appropriately. 

From 078c710dd8aef34e4b21307d8b4d8bd1a7c187cf Mon Sep 17 00:00:00 2001
From: thiyyakat <meghana.thiyyakat@sap.com>
Date: Fri, 3 Oct 2025 15:08:08 +0530
Subject: [PATCH 07/11] Fix limitations

---
 docs/proposals/failed-machine-preservation.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/proposals/failed-machine-preservation.md b/docs/proposals/failed-machine-preservation.md
index 990ad1657..509e08ea7 100644
--- a/docs/proposals/failed-machine-preservation.md
+++ b/docs/proposals/failed-machine-preservation.md
@@ -101,4 +101,4 @@ and the time duration for which these machines will be preserved.
 
 1. During rolling updates we will NOT honor preserving Machines. The Machine will be replaced with a healthy one if it moves to Failed phase.
 2. Since gardener worker pool can correspond to 1..N MachineDeployments depending on number of zones, we will need to distribute the `machinePreserveMax` across N machine deployments.
-So, even if there are no failed machines preserved in other zones, the max per zone would still be enforced. Hence, the value of `failedMachinePreserveMax` should be chosen appropriately. 
+So, even if there are no failed machines preserved in other zones, the max per zone would still be enforced. Hence, the value of `machinePreserveMax` should be chosen appropriately. 

From aa2ae8f7bfa75f41f2c28a840aca654075d16d7d Mon Sep 17 00:00:00 2001
From: thiyyakat <meghana.thiyyakat@sap.com>
Date: Fri, 3 Oct 2025 16:31:17 +0530
Subject: [PATCH 08/11] Add state diagrams

---
 docs/proposals/failed-machine-preservation.md | 50 +++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/docs/proposals/failed-machine-preservation.md b/docs/proposals/failed-machine-preservation.md
index 509e08ea7..ea616ed82 100644
--- a/docs/proposals/failed-machine-preservation.md
+++ b/docs/proposals/failed-machine-preservation.md
@@ -5,6 +5,7 @@
 - [Preservation of Failed Machines](#preservation-of-failed-machines)
     - [Objective](#objective)
     - [Proposal](#proposal)
+    - [State Diagrams](#state-diagrams)
     - [Use Cases](#use-cases)
 
 <!-- /TOC -->
@@ -64,6 +65,55 @@ and the time duration for which these machines will be preserved.
 9. Machines of a MachineDeployment in `Preserved` stage will also be counted towards the replica count and in the enforcement of maximum machines allowed for the MachineDeployment.
 10. At any point in time $count(machinesAnnotatedForPreservation)+count(PreservedMachines)<=machinePreserveMax$. 
 
+## State Diagrams:
+
+1. State Diagram for when a `Running` machine or its node is annotated with `node.machine.sapcloud.io/preserve=now`:
+```mermaid
+stateDiagram-v2
+direction TBP
+    state "Running" as R
+    state "Preserved" as P
+    [*]-->R
+    R --> P: annotated with value=now && max not breached
+    P --> R: annotated with value=false or timeout occurs
+```
+
+2. State Diagram for when a `Running` machine or its node is annotated with `node.machine.sapcloud.io/preserve=when-failed`:
+```mermaid
+stateDiagram-v2
+    state "Running" as R
+    state "Running + Requested" as RR
+    state "Failed" as F
+    state "Preserved 
+    (node drained)" as P
+    state "Terminating" as T
+    [*]-->R
+    R --> RR: annotated with value=when-failed && max not breached
+    RR --> F: on failure
+    F --> P
+    P --> T: on timeout or value=false
+    P --> R: if node Healthy before timeout
+    T --> [*]
+```
+
+3. State Diagram for when an un-annotated `Running` machine fails:
+```mermaid
+stateDiagram-v2
+direction TBP
+    state "Running" as R
+    state "Failed" as F
+    state "Preserved" as P
+    state "Terminating" as T
+    [*] --> R
+    R-->F: on failure
+    F --> P: if max not breached
+    F --> T: if max breached
+    P --> T: on timeout or value=false
+    P --> R : if node Healthy before timeout
+    T --> [*]
+```
+
+
 ## Use Cases:
 
 ### Use Case 1: Proactive Preservation Request

From 9462118bad59fadfaecf446bd155094120b3e589 Mon Sep 17 00:00:00 2001
From: thiyyakat <meghana.thiyyakat@sap.com>
Date: Fri, 3 Oct 2025 16:34:33 +0530
Subject: [PATCH 09/11] Rename file and proposal

---
 ...failed-machine-preservation.md => machine-preservation.md} | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
 rename docs/proposals/{failed-machine-preservation.md => machine-preservation.md} (98%)

diff --git a/docs/proposals/failed-machine-preservation.md b/docs/proposals/machine-preservation.md
similarity index 98%
rename from docs/proposals/failed-machine-preservation.md
rename to docs/proposals/machine-preservation.md
index ea616ed82..ffce5bfa4 100644
--- a/docs/proposals/failed-machine-preservation.md
+++ b/docs/proposals/machine-preservation.md
@@ -1,8 +1,8 @@
-# Preservation of Failed Machines
+# Preservation of Machines
 
 <!-- TOC -->
 
-- [Preservation of Failed Machines](#preservation-of-failed-machines)
+- [Preservation of Machines](#preservation-of-machines)
     - [Objective](#objective)
     - [Proposal](#proposal)
     - [State Diagrams](#state-diagrams)

From 849a99dd0da422affb2944555766d03c76b8985b Mon Sep 17 00:00:00 2001
From: thiyyakat <meghana.thiyyakat@sap.com>
Date: Wed, 8 Oct 2025 19:55:18 +0530
Subject: [PATCH 10/11] Update proposal to reflect changes decided in meeting

---
 docs/proposals/machine-preservation.md | 135 +++++++++++--------------
 1 file changed, 57 insertions(+), 78 deletions(-)

diff --git a/docs/proposals/machine-preservation.md b/docs/proposals/machine-preservation.md
index ffce5bfa4..d31306456 100644
--- a/docs/proposals/machine-preservation.md
+++ b/docs/proposals/machine-preservation.md
@@ -15,140 +15,119 @@
 Currently, the Machine Controller Manager(MCM) moves Machines with errors to the `Unknown` phase, and after the configured `machineHealthTimeout`, to the `Failed` phase.
 `Failed` machines are swiftly moved to the `Terminating` phase during which the node is drained and the `Machine` object is deleted. This rapid cleanup prevents SRE/operators/support from conducting an analysis on the VM and makes finding root cause of failure more difficult.
 
-Moreover, in cases where a node seems healthy but all the workload on it are facing issues, there is a need for operators to be able to cordon/drain the node and conduct their analysis without the cluster-autoscaler scaling down the node.
+Moreover, in cases where a node seems healthy but all the workload on it are facing issues, there is a need for operators to be able to cordon/drain the node and conduct their analysis without the cluster-autoscaler (CA) scaling down the node.
 
 This document proposes enhancing MCM, such that:
 * VMs of machines are retained temporarily for analysis
-* There is a configurable limit to the number of machines that can be preserved
-* There is a configurable limit to the duration for which such machines are preserved
-* Users can specify which healthy machines they would like to preserve in case of failure 
+* There is a configurable limit to the number of machines that can be preserved automatically on failure (auto-preservation)
+* There is a configurable limit to the duration for which machines are preserved
+* Users can specify which healthy machines they would like to preserve in case of failure, or for diagnoses in current state (prevent scale down by CA) 
 * Users can request MCM to release a preserved machine, even before the timeout expires, so that MCM can transition the machine to either `Running` or `Terminating` phase, as the case may be.
 
+Related Issue: https://github.com/gardener/machine-controller-manager/issues/1008
+
 ## Proposal
 
 In order to achieve the objectives mentioned, the following are proposed:
-1. Enhance `machineControllerManager` configuration in the `ShootSpec`, to specify the max number of machines to be preserved,
+1. Enhance `machineControllerManager` configuration in the `ShootSpec`, to specify the max number of machines to be auto-preserved,
 and the time duration for which these machines will be preserved.
     ```
     machineControllerManager:
-       machinePreserveMax: 1
+       autoPreserveFailedMax: 0
        machinePreserveTimeout: 72h
     ```
     * This configuration will be set per worker pool.
     * Since gardener worker pool can correspond to `1..N` MachineDeployments depending on number of zones, `machinePreserveMax` will be distributed across N machine deployments.
     * `machinePreserveMax` must be chosen such that it can be appropriately distributed across the MachineDeployments.
     * Example: if `machinePreserveMax` is set to 2, and the worker pool has 2 zones, then the maximum number of machines that will be preserved per zone is 1.
-2. MCM will be modified to include a new phase `Preserved` to indicate that the machine has been preserved by MCM.
-3. Allow user/operator to request for preservation of a specific machine/node with the use of annotations : `node.machine.sapcloud.io/preserve=now` and `node.machine.sapcloud.io/preserve=when-failed`.
-4. When annotation `node.machine.sapcloud.io/preserve=now` is added to a `Running` machine, the following will take place:
+2. MCM will be modified to include a new sub-phase `Preserved` to indicate that the machine has been preserved by MCM.
+3. Allow user/operator to request for preservation of a specific machine/node with the use of annotation : `node.machine.sapcloud.io/preserve=true`.
+4. When annotation `node.machine.sapcloud.io/preserve=true` is added to a `Running` machine, the following will take place:
    - `cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"` is added to the node to prevent CA from scaling it down.
    - `machine.CurrentStatus.PreserveExpiryTime` is updated by MCM as $machine.CurrentStatus.PreserveExpiryTime = currentTime+machinePreserveTimeout$
-   - The machine stage is changed to `Preserved`
-   - After timeout, the `node.machine.sapcloud.io/preserve=now` and `cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"` are deleted, the machine phase is changed to `Running` and the CA may delete the node. The `machine.CurrentStatus.PreserveExpiryTime` is set to `nil`.
-   - Number of machines explicitly annotated will count towards enforcing `machinePreserveMax`. On breach, the annotation will be rejected.
-5. When annotation `node.machine.sapcloud.io/preserve=when-failed` is added to a `Running` machine and the machine goes to `Failed`, the following will take place:
-   - The machine phase is changed to `Preserved`.
-   - Pods (other than daemonset pods) are drained.
-   - `machine.CurrentStatus.PreserveExpiryTime` is updated by MCM as $machine.CurrentStatus.PreserveExpiryTime = currentTime+machinePreserveTimeout$
-   - After timeout, the `node.machine.sapcloud.io/preserve=when-failed` is deleted. The phase is changed to `Terminating`.
-   - Number of machines explicitly annotated will count towards enforcing `machinePreserveMax`. On breach, the annotation will be rejected.
-6. When an un-annotated machine goes to `Failed` phase and the $count(machinesAnnotatedForPreservation)+count(AutoPreservedMachines)<machinePreserveMax$
-   - The machine's phase is changed to `Preserved`.
+   - The machine's phase is changed to `Running:Preserved`
+   - After timeout, the `node.machine.sapcloud.io/preserve=true` and `cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"` are deleted, the machine phase is changed to `Running` and the CA may delete the node. The `machine.CurrentStatus.PreserveExpiryTime` is set to `nil`.
+5. When an un-annotated machine goes to `Failed` phase and `autoPreserveFailedMax` is not breached:
    - Pods (other than DaemonSet pods) are drained.
+   - The machine's phase is changed to `Failed:Preserved`.
    - `machine.CurrentStatus.PreserveExpiryTime` is updated by MCM as $machine.CurrentStatus.PreserveExpiryTime = currentTime+machinePreserveTimeout$
    - After timeout, the phase is changed to `Terminating`.
-   - Number of machines in `Preserved` phase count towards enforcing `machinePreserveMax`.
-   - In the rest of the doc, the preservation of such un-annotated failed machines is referred to as **"auto-preservation"**.
-7. If a `Failed` machine is currently in `Preserved` and after timeout its VM/node is found to be Healthy, the machine will be moved to `Running`.
-8. A user/operator can request MCM to stop preserving a machine/node in `Preserved` stage using the annotation: `node.machine.sapcloud.io/preserve=false`. 
-   * For a machine thus annotated, MCM will move it either to `Running` phase or `Terminating` depending on the phase of the machine before it was moved to `Preserved`.
-9. Machines of a MachineDeployment in `Preserved` stage will also be counted towards the replica count and in the enforcement of maximum machines allowed for the MachineDeployment.
-10. At any point in time $count(machinesAnnotatedForPreservation)+count(PreservedMachines)<=machinePreserveMax$. 
+   - Number of machines in `Failed:Preserved` phase count towards enforcing `autoPreserveFailedMax`.
+6. If a failed machine is currently in `Failed:Preserved` and after timeout its VM/node is found to be Healthy, the machine will be moved to `Running`.
+7. A user/operator can request MCM to stop preserving a machine/node in `Running:Preserved` or `Failed:Preserved` phase using the annotation: `node.machine.sapcloud.io/preserve=false`. 
+   * MCM will move a machine thus annotated either to `Running` phase or `Terminating` depending on the phase of the machine before it was preserved.
+8. Machines of a MachineDeployment in `Preserved` sub-phase will also be counted towards the replica count and in the enforcement of maximum machines allowed for the MachineDeployment.
+9. MCM will be modified to perform drain in `Failed` phase rather than `Terminating`.
 
 ## State Diagrams:
 
-1. State Diagram for when a `Running` machine or its node is annotated with `node.machine.sapcloud.io/preserve=now`:
+1. State Diagram for when a `Running` machine or its node is annotated with `node.machine.sapcloud.io/preserve=true`:
 ```mermaid
 stateDiagram-v2
 direction TBP
     state "Running" as R
-    state "Preserved" as P
+    state "Running:Preserved" as RP
     [*]-->R
-    R --> P: annotated with value=now && max not breached
-    P --> R: annotated with value=false or timeout occurs
+    R --> RP: annotated with preserve=true
+    RP --> R: annotated with preserve=false or timeout occurs
 ```
 
-2. State Diagram for when a `Running` machine or its node is annotated with `node.machine.sapcloud.io/preserve=when-failed`:
-```mermaid
-stateDiagram-v2
-    state "Running" as R
-    state "Running + Requested" as RR
-    state "Failed" as F
-    state "Preserved 
-    (node drained)" as P
-    state "Terminating" as T
-    [*]-->R
-    R --> RR: annotated with value=when-failed && max not breached
-    RR --> F: on failure
-    F --> P
-    P --> T: on timeout or value=false
-    P --> R: if node Healthy before timeout
-    T --> [*]
-```
-
-3. State Diagram for when an un-annotated `Running` machine fails:
+2. State Diagram for when an un-annotated `Running` machine fails (Auto-preservation):
 ```mermaid
 stateDiagram-v2
 direction TBP
     state "Running" as R
-    state "Failed" as F
-    state "Preserved" as P
+    state "Failed
+    (node drained)" as F
+    state "Failed:Preserved" as FP
     state "Terminating" as T
     [*] --> R
     R-->F: on failure
-    F --> P: if max not breached
-    F --> T: if max breached
-    P --> T: on timeout or value=false
-    P --> R : if node Healthy before timeout
+    F --> FP: if autoPreserveFailedMax not breached
+    F --> T: if autoPreserveFailedMax breached
+    FP --> T: on timeout or value=false
+    FP --> R : if node Healthy before timeout
     T --> [*]
 ```
 
-
 ## Use Cases:
 
 ### Use Case 1: Proactive Preservation Request
 **Scenario:** Operator suspects a machine might fail and wants to ensure preservation for analysis.
 #### Steps:
-1. Operator annotates node with `node.machine.sapcloud.io/preserve=when-failed`, provided `machinePreserveMax` is not violated
-2. Machine fails later
-3. MCM preserves the machine
-4. Operator analyzes the failed VM
+1. Operator annotates node with `node.machine.sapcloud.io/preserve=true`
+2. MCM preserves the machine, and prevents CA from scaling it down
+3. Operator analyzes the VM
 
-### Use Case 2: Automatic Preservation
+### Use Case 2: Auto-Preservation
 **Scenario:** Machine fails unexpectedly, no prior annotation.
 #### Steps:
 1. Machine transitions to `Failed` phase
-2. If `machinePreserveMax` is not breached, machine moved to `Preserved` phase by MCM
-3. After `machinePreserveTimeout`, machine is terminated by MCM
-
-### Use Case 3: Preservation Request for Analysing Running Machine
-**Scenario:** Workload on machine failing. Operator wishes to diagnose.
-#### Steps:
-1. Operator annotates node with `node.machine.sapcloud.io/preserve=now`, provided `machinePreserveMax` is not violated
-2. MCM preserves machine and prevents CA from scaling it down
-3. Operator analyzes the machine
+2. Machine is drained
+3. If `autoPreserveFailedMax` is not breached, machine moved to `Failed:Preserved` phase by MCM
+4. After `machinePreserveTimeout`, machine is terminated by MCM
 
-### Use Case 4: Early Release
+### Use Case 3: Early Release
 **Scenario:** Operator has performed his analysis and no longer requires machine to be preserved
 #### Steps:
-1. Machine is in `Preserved` phase
+1. Machine is in `Running:Preserved` or `Failed:Preserved` phase
 2. Operator adds: `node.machine.sapcloud.io/preserve=false` to node.
-3. MCM transitions machine to `Running` or `Terminating`, depending on which phase it was in before moving to `Preserved`, even though `machinePreserveTimeout` has not expired
-4. Capacity becomes available for preserving future annotated machines or for auto-preservation of `Failed` machines.
+3. MCM transitions machine to `Running` or `Terminating`, for `Running:Preserved` or `Failed:Preserved` respectively, even though `machinePreserveTimeout` has not expired
+4. If machine was in `Failed:Preserved`, capacity becomes available for auto-preservation.
 
 
-## Limitations
+## Points to Note
 
 1. During rolling updates we will NOT honor preserving Machines. The Machine will be replaced with a healthy one if it moves to Failed phase.
-2. Since gardener worker pool can correspond to 1..N MachineDeployments depending on number of zones, we will need to distribute the `machinePreserveMax` across N machine deployments.
-So, even if there are no failed machines preserved in other zones, the max per zone would still be enforced. Hence, the value of `machinePreserveMax` should be chosen appropriately. 
+2. Hibernation policy would override machine preservation. 
+3. If Machine and Node annotation values differ for a particular annotation key (including `node.machine.sapcloud.io/preserve=true`), the Node annotation value will override the Machine annotation value.
+4. If `autoPreserveFailedMax` is reduced in the Shoot Spec, older machines are moved to `Terminating` phase before newer ones.
+5. In case of a scale down of an MCD's replica count, `Preserved` machines will be the last to be scaled down. Replica count will always be honoured.
+6. If the value for annotation key `cluster-autoscaler.kubernetes.io/scale-down-disabled` for a machine in `Running:Preserved` is changed to `false` by a user, the value will be overwritten to `true` by MCM.
+7. On increase/decrease of timeout- new value will only apply to machines that go into `Preserved` phase after the change. Operators can always edit `machine.CurrentStatus.PreserveExpiryTime` to prolong the expiry time of existing `Preserved` machines.
+    - can specify timeout
+8. [Modify CA FAQ](https://github.com/gardener/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-prevent-cluster-autoscaler-from-scaling-down-a-particular-node) once feature is developed to use `node.machine.sapcloud.io/preserve=true` instead of the `cluster-autoscaler.kubernetes.io/scale-down-disabled=true` currently suggested. This would:
+   - harmonise machine flow
+   - shield from CA's internals
+   - make it generic and no longer CA specific
+   - allow a timeout to be specified
\ No newline at end of file

From 227b3cd9d221bf51e2ab47aafaa82b93d0e62be1 Mon Sep 17 00:00:00 2001
From: thiyyakat <meghana.thiyyakat@sap.com>
Date: Thu, 9 Oct 2025 11:33:40 +0530
Subject: [PATCH 11/11] Modify proposal to support use case for
 `preserve=when-failed`

---
 docs/proposals/machine-preservation.md | 76 ++++++++++++++++++--------
 1 file changed, 52 insertions(+), 24 deletions(-)

diff --git a/docs/proposals/machine-preservation.md b/docs/proposals/machine-preservation.md
index d31306456..945cad115 100644
--- a/docs/proposals/machine-preservation.md
+++ b/docs/proposals/machine-preservation.md
@@ -41,38 +41,59 @@ and the time duration for which these machines will be preserved.
     * `machinePreserveMax` must be chosen such that it can be appropriately distributed across the MachineDeployments.
     * Example: if `machinePreserveMax` is set to 2, and the worker pool has 2 zones, then the maximum number of machines that will be preserved per zone is 1.
 2. MCM will be modified to include a new sub-phase `Preserved` to indicate that the machine has been preserved by MCM.
-3. Allow user/operator to request for preservation of a specific machine/node with the use of annotation : `node.machine.sapcloud.io/preserve=true`.
-4. When annotation `node.machine.sapcloud.io/preserve=true` is added to a `Running` machine, the following will take place:
+3. Allow user/operator to request for preservation of a specific machine/node with the use of annotations : `node.machine.sapcloud.io/preserve=now` and `node.machine.sapcloud.io/preserve=when-failed`.
+4. When annotation `node.machine.sapcloud.io/preserve=now` is added to a `Running` machine, the following will take place:
    - `cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"` is added to the node to prevent CA from scaling it down.
    - `machine.CurrentStatus.PreserveExpiryTime` is updated by MCM as $machine.CurrentStatus.PreserveExpiryTime = currentTime+machinePreserveTimeout$
    - The machine's phase is changed to `Running:Preserved`
-   - After timeout, the `node.machine.sapcloud.io/preserve=true` and `cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"` are deleted, the machine phase is changed to `Running` and the CA may delete the node. The `machine.CurrentStatus.PreserveExpiryTime` is set to `nil`.
-5. When an un-annotated machine goes to `Failed` phase and `autoPreserveFailedMax` is not breached:
+   - After timeout, the `node.machine.sapcloud.io/preserve=now` and `cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"` are deleted, the machine phase is changed to `Running` and the CA may delete the node. The `machine.CurrentStatus.PreserveExpiryTime` is set to `nil`.
+5. When annotation `node.machine.sapcloud.io/preserve=when-failed` is added to a `Running` machine and the machine goes to `Failed`, the following will take place:
+    - The machine is drained of pods except for Daemonset pods.
+    - The machine phase is changed to `Failed:Preserved`.
+    - `machine.CurrentStatus.PreserveExpiryTime` is updated by MCM as $machine.CurrentStatus.PreserveExpiryTime = currentTime+machinePreserveTimeout$
+    - After timeout, the `node.machine.sapcloud.io/preserve=when-failed` is deleted. The phase is changed to `Terminating`.
+6. When an un-annotated machine goes to `Failed` phase and `autoPreserveFailedMax` is not breached:
    - Pods (other than DaemonSet pods) are drained.
    - The machine's phase is changed to `Failed:Preserved`.
    - `machine.CurrentStatus.PreserveExpiryTime` is updated by MCM as $machine.CurrentStatus.PreserveExpiryTime = currentTime+machinePreserveTimeout$
    - After timeout, the phase is changed to `Terminating`.
    - Number of machines in `Failed:Preserved` phase count towards enforcing `autoPreserveFailedMax`.
-6. If a failed machine is currently in `Failed:Preserved` and after timeout its VM/node is found to be Healthy, the machine will be moved to `Running`.
-7. A user/operator can request MCM to stop preserving a machine/node in `Running:Preserved` or `Failed:Preserved` phase using the annotation: `node.machine.sapcloud.io/preserve=false`. 
+7. If a failed machine is currently in `Failed:Preserved` and before timeout its VM/node is found to be Healthy, the machine will be moved to `Running`.
+8. A user/operator can request MCM to stop preserving a machine/node in `Running:Preserved` or `Failed:Preserved` phase using the annotation: `node.machine.sapcloud.io/preserve=false`. 
    * MCM will move a machine thus annotated either to `Running` phase or `Terminating` depending on the phase of the machine before it was preserved.
-8. Machines of a MachineDeployment in `Preserved` sub-phase will also be counted towards the replica count and in the enforcement of maximum machines allowed for the MachineDeployment.
-9. MCM will be modified to perform drain in `Failed` phase rather than `Terminating`.
+9. Machines of a MachineDeployment in `Preserved` sub-phase will also be counted towards the replica count and in the enforcement of maximum machines allowed for the MachineDeployment.
+10. MCM will be modified to perform drain in `Failed` phase rather than `Terminating`.
 
 ## State Diagrams:
 
-1. State Diagram for when a `Running` machine or its node is annotated with `node.machine.sapcloud.io/preserve=true`:
+1. State Diagram for when a `Running` machine or its node is annotated with `node.machine.sapcloud.io/preserve=now`:
 ```mermaid
 stateDiagram-v2
 direction TBP
     state "Running" as R
     state "Running:Preserved" as RP
     [*]-->R
-    R --> RP: annotated with preserve=true
+    R --> RP: annotated with preserve=now
     RP --> R: annotated with preserve=false or timeout occurs
 ```
-
-2. State Diagram for when an un-annotated `Running` machine fails (Auto-preservation):
+2. State Diagram for when a `Running` machine or its node is annotated with `node.machine.sapcloud.io/preserve=when-failed`:
+```mermaid
+stateDiagram-v2
+    state "Running" as R
+    state "Running + Requested" as RR
+    state "Failed
+    (node drained)" as F
+    state "Failed:Preserved" as P
+    state "Terminating" as T
+    [*]-->R
+    R --> RR: annotated with preserve=when-failed
+    RR --> F: on failure
+    F --> P
+    P --> T: on timeout or preserve=false
+    P --> R: if node Healthy before timeout
+    T --> [*]
+```
+3. State Diagram for when an un-annotated `Running` machine fails (Auto-preservation):
 ```mermaid
 stateDiagram-v2
 direction TBP
@@ -92,14 +113,23 @@ direction TBP
 
 ## Use Cases:
 
-### Use Case 1: Proactive Preservation Request
-**Scenario:** Operator suspects a machine might fail and wants to ensure preservation for analysis.
+### Use Case 1: Preservation Request for Analysing Running Machine
+**Scenario:** Workload on machine failing. Operator wishes to diagnose.
 #### Steps:
-1. Operator annotates node with `node.machine.sapcloud.io/preserve=true`
+1. Operator annotates node with `node.machine.sapcloud.io/preserve=now`
 2. MCM preserves the machine, and prevents CA from scaling it down
 3. Operator analyzes the VM
 
-### Use Case 2: Auto-Preservation
+### Use Case 2: Proactive Preservation Request
+**Scenario:** Operator suspects a machine might fail and wants to ensure preservation for analysis.
+#### Steps:
+1. Operator annotates node with `node.machine.sapcloud.io/preserve=when-failed`
+2. Machine fails later
+3. MCM preserves the machine 
+4. Operator analyzes the VM
+
+
+### Use Case 3: Auto-Preservation
 **Scenario:** Machine fails unexpectedly, no prior annotation.
 #### Steps:
 1. Machine transitions to `Failed` phase
@@ -107,7 +137,7 @@ direction TBP
 3. If `autoPreserveFailedMax` is not breached, machine moved to `Failed:Preserved` phase by MCM
 4. After `machinePreserveTimeout`, machine is terminated by MCM
 
-### Use Case 3: Early Release
+### Use Case 4: Early Release
 **Scenario:** Operator has performed his analysis and no longer requires machine to be preserved
 #### Steps:
 1. Machine is in `Running:Preserved` or `Failed:Preserved` phase
@@ -115,18 +145,16 @@ direction TBP
 3. MCM transitions machine to `Running` or `Terminating`, for `Running:Preserved` or `Failed:Preserved` respectively, even though `machinePreserveTimeout` has not expired
 4. If machine was in `Failed:Preserved`, capacity becomes available for auto-preservation.
 
-
 ## Points to Note
 
-1. During rolling updates we will NOT honor preserving Machines. The Machine will be replaced with a healthy one if it moves to Failed phase.
-2. Hibernation policy would override machine preservation. 
-3. If Machine and Node annotation values differ for a particular annotation key (including `node.machine.sapcloud.io/preserve=true`), the Node annotation value will override the Machine annotation value.
+1. During rolling updates MCM will NOT honor preserving Machines. The Machine will be replaced with a healthy one if it moves to Failed phase.
+2. Hibernation policy will override machine preservation. 
+3. If Machine and Node annotation values differ for a particular annotation key, the Node annotation value will override the Machine annotation value.
 4. If `autoPreserveFailedMax` is reduced in the Shoot Spec, older machines are moved to `Terminating` phase before newer ones.
 5. In case of a scale down of an MCD's replica count, `Preserved` machines will be the last to be scaled down. Replica count will always be honoured.
 6. If the value for annotation key `cluster-autoscaler.kubernetes.io/scale-down-disabled` for a machine in `Running:Preserved` is changed to `false` by a user, the value will be overwritten to `true` by MCM.
-7. On increase/decrease of timeout- new value will only apply to machines that go into `Preserved` phase after the change. Operators can always edit `machine.CurrentStatus.PreserveExpiryTime` to prolong the expiry time of existing `Preserved` machines.
-    - can specify timeout
-8. [Modify CA FAQ](https://github.com/gardener/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-prevent-cluster-autoscaler-from-scaling-down-a-particular-node) once feature is developed to use `node.machine.sapcloud.io/preserve=true` instead of the `cluster-autoscaler.kubernetes.io/scale-down-disabled=true` currently suggested. This would:
+7. On increase/decrease of timeout, the new value will only apply to machines that go into `Preserved` phase after the change. Operators can always edit `machine.CurrentStatus.PreserveExpiryTime` to prolong the expiry time of existing `Preserved` machines.
+8. [Modify CA FAQ](https://github.com/gardener/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-prevent-cluster-autoscaler-from-scaling-down-a-particular-node) once feature is developed to use `node.machine.sapcloud.io/preserve=now` instead of the `cluster-autoscaler.kubernetes.io/scale-down-disabled=true` currently suggested. This would:
    - harmonise machine flow
    - shield from CA's internals
    - make it generic and no longer CA specific