From 4d1a76d26c9458e0376cebdce93a9f22a4f09b7f Mon Sep 17 00:00:00 2001 From: Michal Pryc Date: Wed, 3 Apr 2024 15:18:05 +0200 Subject: [PATCH] Design documentation for the NonAdminBackup Status and Conditions Two proposals for updating and keeping in sync NonAdminBackup Status and Condition fields. The Status also consists of Velero Backup Name within NAB object. Signed-off-by: Michal Pryc --- design/nab_status_update_ver_1.md | 86 +++++++++++++++++++++++++++++ design/nab_status_update_ver_2.md | 90 +++++++++++++++++++++++++++++++ 2 files changed, 176 insertions(+) create mode 100644 design/nab_status_update_ver_1.md create mode 100644 design/nab_status_update_ver_2.md diff --git a/design/nab_status_update_ver_1.md b/design/nab_status_update_ver_1.md new file mode 100644 index 0000000..ea5fe94 --- /dev/null +++ b/design/nab_status_update_ver_1.md @@ -0,0 +1,86 @@ +# Developer Workflow: NonAdminBackup Phase and Conditions within NonAdminBackup Status + +## Version + +This is `ver_1` of the design document. Simplifier approach is described in the [ver_2]. + +## Overview + +This document outlines the design around updating NonAdminBackup objects Phase and Conditions within Status. + +### Phase + +A NonAdminBackup's `status` field has a `phase` field, which is updated by NAC controller. + +The `phase` is a simple one high-level summary of the lifecycle of an NonAdminBackup. + +It is always a one well defined value, that is intended to be a comprehensive state of a NonAdminBackup. + +Those are are the possible values for phase: + +| **Value** | **Description** | +|-----------|--------------------------------| +| New | *NonAdminBackup resource was accepted by the OpenShift cluster, but it has not yet been processed by the NonAdminController* | +| BackingOff | Velero *Backup* object was not created due to NonAdminBackup error (configuration or similar) | +| Created | Velero *Backup* was created, but it has not yet fully ran or started | +| Running | Velero *Backup* is currently running | +| Succeeded | Velero *Backup* was performed with Success state | +| Failed | Velero *Backup* did not succeed | + +### Conditions + +The `conditions` is also a part of the NonAdminBackup's `status` field. One NAB object may have multiple conditions. It is more granular knowledge of the NonAdminBackup object and represents the array of the conditions through which the NonAdminBackup has or has not passed. Each `NonAdminCondition` has one of the following `type`: + +| **Condition** | **Description** | +|-----------|--------------------------------| +| BackupAccepted | The Backup object was accepted by the reconcile loop, but the Velero Backup may have not yet been created | +| BackupQueued | The Velero Backup was created succesfully and did not return any known errors. It's in the queue for Backup. At this stage errors may still occur during backup procedure. | +| BackupCompleted | The Velero Backup succeeded. | + +The `condition` data is also accomapied with the following: + +| **Field name** | **Description** | +|-----------|--------------------------------| +| type | The `Type` of the condition | +| status | represents the state of individual condition. The resulting `phase` value should report `Success` only when all of the conditions are met and the backup succeded. One of `True`, `False` or `Unknown`. | +| lastProbeTime | Timestamp of when the NonAdminBackup condition was last probed. | +| lastTransitionTime | Timestamp for when the NonAdminBackup last transitioned from one status to another. | +| reason | Machine-readable, UpperCamelCase text indicating the reason for the condition's last transition. | +| message | Human-readable message indicating details about the last status transition. | + +### BackupStatus + +`BackupStatus` which is also part of the `NonAdminBackupStatus` object is a `BackupStatus` that is taken directly from the Velero Backup Status and copied over. + +## Phase Update scenarios + + *** Questions *** + + - BackupQueued stays true at the end, + should we remove that from conditions? + I don't think we should set it to false. + +```mermaid +graph +AA[Phase: New] -->A +A{BackupAccepted: Unknown\n BackupQueued: False\n BackupCompleted: False} -- NAC processes --> B[Non Admin Backup Accepted] +A -- NAC hits an error --> E[Phase: BackingOff] +B -- Create Velero Backup --> C[Phase: Created] +B -.-> BA{BackupAccepted: True\nBackupQueued: False\nBackupCompleted: False} +C -- Backup runs --> D[Phase: Running] +C -.-> BS{BackupAccepted: True\nBackupQueued: True\nBackupCompleted: False} +D -- Backup succeeds --> F[Phase: Succeeded] +F -.-> FF{BackupAccepted: True\nBackupQueued: True\nBackupCompleted: True} + +D -- Backup fails --> G[Phase: Failed] +E -.-> EE{BackupAccepted: False\nBackupQueued: False\nBackupCompleted: False} + +classDef conditions fill:#333,stroke:#ccc,stroke-width:2px; +class A,BA,BS,BC,EE,FF conditions; +classDef phases fill:#777,stroke:#ccc,stroke-width:2px; +class AA,C,D,E,G,F phases; + +``` + + +[ver_2]: ./nab_status_update_ver_2.md \ No newline at end of file diff --git a/design/nab_status_update_ver_2.md b/design/nab_status_update_ver_2.md new file mode 100644 index 0000000..604745e --- /dev/null +++ b/design/nab_status_update_ver_2.md @@ -0,0 +1,90 @@ +# Developer Workflow: NonAdminBackup Phase and Conditions within NonAdminBackup Status + +## Version + +This is `ver_2` of the design document. More complex approach with more states is described in the [ver_1]. + +## Overview + +This document outlines the design around updating NonAdminBackup objects Phase and Conditions within Status. + +### Phase + +A NonAdminBackup's `status` field has a `phase` field, which is updated by NAC controller. + +The `phase` is a simple one high-level summary of the lifecycle of an NonAdminBackup. + +It is always a one well defined value, that is intended to be a comprehensive state of a NonAdminBackup. + +Those are are the possible values for phase: + +| **Value** | **Description** | +|-----------|--------------------------------| +| New | *NonAdminBackup resource was accepted by the OpenShift cluster, but it has not yet been processed by the NonAdminController* | +| BackingOff | Velero *Backup* object was not created due to NonAdminBackup error (configuration or similar) | +| Created | Velero *Backup* was created. The Phase will not have additional informations about the | + +### Conditions + +The `conditions` is also a part of the NonAdminBackup's `status` field. One NAB object may have multiple conditions. It is more granular knowledge of the NonAdminBackup object and represents the array of the conditions through which the NonAdminBackup has or has not passed. Each `NonAdminCondition` has one of the following `type`: + +| **Condition** | **Description** | +|-----------|--------------------------------| +| BackupAccepted | The Backup object was accepted by the reconcile loop, but the Velero Backup may have not yet been created | +| BackupQueued | The Velero Backup was created succesfully and did not return any known errors. It's in the queue for Backup. At this stage errors may still occur during backup procedure. | + +The `condition` data is also accomapied with the following: + +| **Field name** | **Description** | +|-----------|--------------------------------| +| type | The `Type` of the condition | +| status | represents the state of individual condition. The resulting `phase` value should report `Success` only when all of the conditions are met and the backup succeded. One of `True`, `False` or `Unknown`. | +| lastProbeTime | Timestamp of when the NonAdminBackup condition was last probed. | +| lastTransitionTime | Timestamp for when the NonAdminBackup last transitioned from one status to another. | +| reason | Machine-readable, UpperCamelCase text indicating the reason for the condition's last transition. | +| message | Human-readable message indicating details about the last status transition. | + +### BackupStatus + +`BackupStatus` which is also part of the `NonAdminBackupStatus` object is a `BackupStatus` that is taken directly from the Velero Backup Status and copied over. + +### OadpVeleroBackupName +The `OadpVeleroBackupName` field is a component of the `NonAdminBackupStatus` object. It represents the name of the `VeleroBackup` object, which encompasses the namespace. This `OadpVeleroBackupName` serves as a reference to the Backup responsible for executing the backup task. + +The format of the `OadpVeleroBackupName` allows to interact with that Backup using `oc` or `velero` commands as follows: + +```shell +# . + +$ oc describe . + +$ velero backup describe . +``` + + +## Phase Update scenarios + + *** Questions *** + + - BackupQueued stays true at the end, + should we remove that from conditions? + I don't think we should set it to false. + +```mermaid +graph +AA[Phase: New] -->A +A{BackupAccepted: Unknown\n BackupQueued: False\n} -- NAC processes --> B[Non Admin Backup Accepted] +A -- NAC hits an error --> E[Phase: BackingOff] +B -- Create Velero Backup --> C[Phase: Created] +B -.-> BA{BackupAccepted: True\nBackupQueued: False\n} +C -.-> BS{BackupAccepted: True\nBackupQueued: True\n} +E -.-> EE{BackupAccepted: False\nBackupQueued: False\n} + +classDef conditions fill:#333,stroke:#ccc,stroke-width:2px; +class A,BA,BS,EE conditions; +classDef phases fill:#777,stroke:#ccc,stroke-width:2px; +class AA,C,E phases; + +``` + +[ver_1]: ./nab_status_update_ver_1.md \ No newline at end of file