Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add non-admin controller design #18

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/design/Backup-Workflow-Details.jpg
shubham-pampattiwar marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
133 changes: 133 additions & 0 deletions docs/design/Non_Admin_Controller_design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# Non-Admin Backup/Restore Design

## Background
OADP (Openshift API for Data Protection) Operator currently requires cluster admin access for performing Backup and Restore operations of applications deployed on the OpenShift platform. This design intends to enable the ability to perform Backup and Restore operations of their own application namespace for namespace owners aka non-admin users.
shubham-pampattiwar marked this conversation as resolved.
Show resolved Hide resolved

## Goals
- Enable non-admin backup operation
- Enable non-admin restore operation

## Non-Goals
- Performance improvements of Backup and Restore Operations
- Parallel Backup and Restore Operations

## Use-Cases

### Backup Operation
- As a non-admin user/namespace owner with administrative priviledges for a particular namespace, the user should be able to:
- Create a Backup/Schedule of the namespace
shubham-pampattiwar marked this conversation as resolved.
Show resolved Hide resolved
- Update the Backup/Schedule spec of the namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one I am confused of what will happen. Example: if I update after Velero started (or finished) the backup process, nothing will happen, right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not even sure what happens to Velero if you update a backup after it starts. Not sure we need to make any promises here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for backups -- for schedules, those can be updated, but see above as to whether we need schedules for the first iteration)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if you update backup spec, nac could create deleteBackupRequests, mark NAB as pending, then once old backup is gone, NAB will be processing with new backup of the same name.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, updating backup spec makes sense as it just does exactly what a normal velero (admin) user modifying backup spec would do. If there's any modification that makes sense in velero, then it could be done here.

- View the status of the Backup/Schedule created for the particular namespace
- Delete the Backup/Schedule of the namespace

### Restore Operation
- As a non-admin user/namespace owner with administrative priviledges for a particular namespace, the user should be able to:
- Create a Restore of the namespace
- Update the Restore spec of the namespace
shubham-pampattiwar marked this conversation as resolved.
Show resolved Hide resolved
- View the status of the Restore created for the particular namespace
- Delete the Restore of the namespace


## Installation

- The Non-Admin Controller (NAC) will be installed via OADP Operator.
- The Data Protection Application (DPA) CR will consist of a root level spec flag called `enableNonAdmin`
- If the `enableNonAdmin` flag is set to `true`, the OADP Operator will install the NAC in OADP Operator's install namespace, the default value for `enableNonAdmin` will be `false`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some validation to ensure that only one DPA in the cluster sets this to true. Otherwise, if there are 2 OADP instances, both enabling non-admin, we will have race conditions around which velero instance handles each NonAdminBackup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are 2 OADP instances, both enabling non-admin

Both instances in a supported scenario would be in separate oadp namespaces, I don't see a possible overlap here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default it watches all Namespaces

nvm. agree with sseago.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. scenario is this:

  • OADP installed in openshift-adp
  • OADP installed in openshift-adp2
  • user creates NonAdminBackup in mysql-persistent

If both OADPs have non-admin enabled, it's a race condition as to which one grabs and labels/annotates it first (and creates Velero CR), and possibly in some cases you end up with both attempting to modify it at the same time. If only one of them has non-admin enabled, then that OADP will manage it, and the other OADP will stay out of the way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think non admin use case will prop up only in env where multiple users share a cluster. Therefore I think we need to ensure that multiple non admin controllers per velero is supported, so we need a way to limit the scope of each nac so it doesn't overlap.


## Pre-requisites
- **OADP installed**: OADP must be installed and configured to use non-admin controller
- **Non Admin Controller configured**: Data Protection Application (DPA) instance must configure Non Admin Controller to watch user namespace(s), by default it watches all Namespaces
- **RBAC priviledges for the user**: User must have the appropriate RBAC priviledges to create Non Admin Backup object within the Namespace where Backup will be taken. An example of such ClusterRole, which may be added to the user with `RoleBinding`:
```yaml
# permissions for end users to edit nonadminbackups.
shubham-pampattiwar marked this conversation as resolved.
Show resolved Hide resolved
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/name: clusterrole
app.kubernetes.io/instance: nonadminbackup-editor-role
app.kubernetes.io/component: rbac
app.kubernetes.io/created-by: oadp-nac
app.kubernetes.io/part-of: oadp-nac
app.kubernetes.io/managed-by: kustomize
name: nonadminbackup-editor-role
rules:
- apiGroups:
- nac.oadp.openshift.io
resources:
- nonadminbackups
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- nac.oadp.openshift.io
resources:
- nonadminbackups/status
verbs:
- get

```

## High-Level design

### Components
- OADP Operator: OADP is the OpenShift API for Data Protection operator. This open source operator sets up and installs Velero on the OpenShift platform, allowing users to backup and restore applications.
- Controllers: The Non-Admin controller will pack the following controllers as part of it:
- Non-Admin Backup (NAB) Controller: The responsibilities of the NAB controller are:
- Validate whether the non-admin user has appropriate administrative namepsace access
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/namepsace/namespace/

- Validate Wehther the non-admin user has appropriate access to create/view/update /delete Non-Admin Backup CR
shubham-pampattiwar marked this conversation as resolved.
Show resolved Hide resolved
- Listen to requests pertaining to Non-Admin Backup CRD across all the namespaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham-pampattiwar please update this such that more than one OADP/NAC combo can be installed on the same cluster. Requests will have to be namespace filtered. @mpryc @mateusoliveira43 I suppose the namespaces for the NAC will be configured in the DPA?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not discussed, I believe. But one option, yes

- Process requests pertaining to Non-Admin Backup CRD across all the namespaces
- Update Non-Admin CR status with the status/events from Velero Backup CR
- Cascade Any actions performed on Non-Admin Backup CR to corresponding Velero backup CR
- Non-Admin Restore (NAR) Controller
- CRDs: The following CRDs will be provided to Non-Admin users:
- Non-Admin Backup (NAB) CRD: This iCRD will encapsulate the whole Velero Backup CRD and some additional spec felds that will be needed for non-admin feature.
shubham-pampattiwar marked this conversation as resolved.
Show resolved Hide resolved
- Non-Admin Restore (NAR) CRD
shubham-pampattiwar marked this conversation as resolved.
Show resolved Hide resolved

**Note:** Currently, this design considers that the responsibility of the BackupStorageLocation configuration is that of the cluster admin and not non-admin/namespace admin. Hence, no introduction of non-admin BSL controllers and CRDs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, this means users will not be managing their own BSLs/buckets. They will have to know the name of the configured BSL (given by admin), and the NAC will assume that any user who knows the name of a BSL is permitted to back up to that BSL. Is this correct?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, any NAC user who does not specify/know the BSL name will be considered approved to use the default BSL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will have a way for admin to scope who can use which bsl, the nab/nar controller will validate accordingly before processing the CRs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will that require a CR, or will it be managed via annotations/labels/etc on the BSL itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up for discussion.


### Implementation details
- Backup Workflow
- **Non-Admin user creates/updates a Non-Admin backup CR:**
1. **User creates or updates the NonAdminBackup object**: The user creates or updates a NonAdminBackup custom resource object in the Namespace on which the backup will run within the Kubernetes cluster. The `NonAdminBackup` schema has the `backupSpec`, which is the same as `Backup` CR from the `velero.io/v1` apiVersion.

```yaml
apiVersion: nac.oadp.openshift.io/v1alpha1
kind: NonAdminBackup
metadata:
name: example
namespace: user-namespace
spec:
backupSpec: {}
```
- **NAB controller reconiles on this NAB CR:** The NonAdminBackup controller continuously reconciles the NonAdminBackup object's desired state with the actual state in the cluster.
- **NAB controller validates the NAB CR and then creates a corresponding Velero Backup CR:** When the NonAdminBackup controller detects a new or modified NonAdminBackup object, it creates or updates a corresponding Velero Backup object within the OADP Namespace using the information provided in the `backupSpec` field of the NonAdminBackup object. The resulting Backup object is named as `nab-<namespace>-<hash>`, where the `<namespace>` is the NonAdminBackup namespace and the `<hash>` is computed from the original NonAdminBackup name. The resulting Backup object is labeled and annotated with the following additional metadata:

```yaml
metadata:
annotations:
openshift.io/oadp-nab-origin-name: <NonAdminBackup name>
openshift.io/oadp-nab-origin-namespace: <NonAdminBackup Namespace>
openshift.io/oadp-nab-origin-uuid: <NonAdminBackup UUID>
labels:
app.kubernetes.io/managed-by: <OADP NonAdminController id>
openshift.io/oadp: 'True'
```
- **Velero runs Backup**: Velero executes the backup operation based on the configuration specified in the Velero Backup object. Velero updates the status of the Velero Backup object to reflect the outcome of the backup process.
- **Velero runs Backup**: Velero executes the backup operation based on the configuration specified in the Velero Backup object. Velero updates the status of the Velero Backup object to reflect the outcome of the backup process.
shubham-pampattiwar marked this conversation as resolved.
Show resolved Hide resolved
- **Reconcile loop updates NonAdminBackup object Status**: Upon detecting changes in the status of the Velero Backup object, the NonAdminBackup controller's reconciliation loop updates the Status field of the corresponding NonAdminBackup object with the updated status from the Velero Backup object.

![NAB-Backup Workflow Diagram](nab-backup-workflow.jpg)
shubham-pampattiwar marked this conversation as resolved.
Show resolved Hide resolved
![NAB-Controller Backup Details Diagram](Backup-Workflow-Details.jpg)
- Restore Workflow


## Open Questions and Know Limitations
- Velero command and pod logs
- Multiple instance of OADP Operator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add:

  • status regarding the queue of backups pending or running that may be blocking the non-admin backup.
  • e.g. There are currently 5 OADP backups queued.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple instances of OADP operator can exist, but we must make sure that no more than one of them enable non-admin. If a second DPA adds enableNonAdmin, that should trigger a validation error.

Binary file added docs/design/nab-backup-workflow.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading