Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add non-admin controller design #18

Merged

Conversation

shubham-pampattiwar
Copy link
Member

@shubham-pampattiwar shubham-pampattiwar commented Mar 13, 2024

Copy link
Collaborator

@mpryc mpryc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham-pampattiwar I also was writing similar doc, pushed to my branch (without PR) but it's different as I am using mermaid to create diagrams.

Let's discuss on our sync which way to go. I like the clarity of your diagram a bit better, mine is more detailed, but maybe too much?

https://github.com/mpryc/oadp-non-admin/blob/docs_design/docs/design/NonAdminBackup.md

### Backup Operation
- As a non-admin user/namespace owner with administrative priviledges for a particular namespace, the user should be able to:
- Create a Backup/Schedule of the namespace
- Update the Backup/Schedule spec of the namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one I am confused of what will happen. Example: if I update after Velero started (or finished) the backup process, nothing will happen, right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not even sure what happens to Velero if you update a backup after it starts. Not sure we need to make any promises here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for backups -- for schedules, those can be updated, but see above as to whether we need schedules for the first iteration)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if you update backup spec, nac could create deleteBackupRequests, mark NAB as pending, then once old backup is gone, NAB will be processing with new backup of the same name.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, updating backup spec makes sense as it just does exactly what a normal velero (admin) user modifying backup spec would do. If there's any modification that makes sense in velero, then it could be done here.


## Open Questions and Know Limitations
- Velero command and pod logs
- Multiple instance of OADP Operator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add:

  • status regarding the queue of backups pending or running that may be blocking the non-admin backup.
  • e.g. There are currently 5 OADP backups queued.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple instances of OADP operator can exist, but we must make sure that no more than one of them enable non-admin. If a second DPA adds enableNonAdmin, that should trigger a validation error.

docs/design/Non_Admin_Controller_design.md Outdated Show resolved Hide resolved
docs/design/Non_Admin_Controller_design.md Outdated Show resolved Hide resolved
### Backup Operation
- As a non-admin user/namespace owner with administrative priviledges for a particular namespace, the user should be able to:
- Create a Backup/Schedule of the namespace
- Update the Backup/Schedule spec of the namespace
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not even sure what happens to Velero if you update a backup after it starts. Not sure we need to make any promises here.

### Backup Operation
- As a non-admin user/namespace owner with administrative priviledges for a particular namespace, the user should be able to:
- Create a Backup/Schedule of the namespace
- Update the Backup/Schedule spec of the namespace
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for backups -- for schedules, those can be updated, but see above as to whether we need schedules for the first iteration)


- The Non-Admin Controller (NAC) will be installed via OADP Operator.
- The Data Protection Application (DPA) CR will consist of a root level spec flag called `enableNonAdmin`
- If the `enableNonAdmin` flag is set to `true`, the OADP Operator will install the NAC in OADP Operator's install namespace, the default value for `enableNonAdmin` will be `false`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some validation to ensure that only one DPA in the cluster sets this to true. Otherwise, if there are 2 OADP instances, both enabling non-admin, we will have race conditions around which velero instance handles each NonAdminBackup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are 2 OADP instances, both enabling non-admin

Both instances in a supported scenario would be in separate oadp namespaces, I don't see a possible overlap here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default it watches all Namespaces

nvm. agree with sseago.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. scenario is this:

  • OADP installed in openshift-adp
  • OADP installed in openshift-adp2
  • user creates NonAdminBackup in mysql-persistent

If both OADPs have non-admin enabled, it's a race condition as to which one grabs and labels/annotates it first (and creates Velero CR), and possibly in some cases you end up with both attempting to modify it at the same time. If only one of them has non-admin enabled, then that OADP will manage it, and the other OADP will stay out of the way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think non admin use case will prop up only in env where multiple users share a cluster. Therefore I think we need to ensure that multiple non admin controllers per velero is supported, so we need a way to limit the scope of each nac so it doesn't overlap.

docs/design/Non_Admin_Controller_design.md Outdated Show resolved Hide resolved
- Non-Admin Backup (NAB) CRD: This iCRD will encapsulate the whole Velero Backup CRD and some additional spec felds that will be needed for non-admin feature.
- Non-Admin Restore (NAR) CRD

**Note:** Currently, this design considers that the responsibility of the BackupStorageLocation configuration is that of the cluster admin and not non-admin/namespace admin. Hence, no introduction of non-admin BSL controllers and CRDs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, this means users will not be managing their own BSLs/buckets. They will have to know the name of the configured BSL (given by admin), and the NAC will assume that any user who knows the name of a BSL is permitted to back up to that BSL. Is this correct?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, any NAC user who does not specify/know the BSL name will be considered approved to use the default BSL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will have a way for admin to scope who can use which bsl, the nab/nar controller will validate accordingly before processing the CRs.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will that require a CR, or will it be managed via annotations/labels/etc on the BSL itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up for discussion.

docs/design/Non_Admin_Controller_design.md Outdated Show resolved Hide resolved
docs/design/Non_Admin_Controller_design.md Outdated Show resolved Hide resolved

## Open Questions and Know Limitations
- Velero command and pod logs
- Multiple instance of OADP Operator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple instances of OADP operator can exist, but we must make sure that no more than one of them enable non-admin. If a second DPA adds enableNonAdmin, that should trigger a validation error.

Copy link
Member

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If admin creates dpa on another cluster, and use the bsl containing NAC backups, will NAC backups.velero.io synced back to cluster regenerate backup.nac.oadp.openshift.io?

@sseago
Copy link

sseago commented Mar 21, 2024

If admin creates dpa on another cluster, and use the bsl containing NAC backups, will NAC backups.velero.io synced back to cluster regenerate backup.nac.oadp.openshift.io?

This is a good question. Might be required for disaster recovery of non-admin backups (i.e. cluster is corrupted, rebuilt from scratch, add BSL to new OADP instance, user backups have returned. This would require a non-admin backup sync controller to watch in-cluster velero Backups, and if one is found with NAC labels/annotations without a corresponding NAC backup, it would regenerate (creating the namespace if necessary).

- Non-Admin Backup (NAB) Controller: The responsibilities of the NAB controller are:
- Validate whether the non-admin user has appropriate administrative namepsace access
- Validate Wehther the non-admin user has appropriate access to create/view/update /delete Non-Admin Backup CR
- Listen to requests pertaining to Non-Admin Backup CRD across all the namespaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham-pampattiwar please update this such that more than one OADP/NAC combo can be installed on the same cluster. Requests will have to be namespace filtered. @mpryc @mateusoliveira43 I suppose the namespaces for the NAC will be configured in the DPA?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not discussed, I believe. But one option, yes

…ntroller, Queuing mechanism, Validating Webhooks, NAC watch NS filtering and control over spec exposed to non-admin users
@openshift-ci openshift-ci bot added the approved label Apr 9, 2024
@shubham-pampattiwar
Copy link
Member Author

This latest commit addresses the following:

  • Restore workflow
  • Non-Admin Backup Sync controller
  • NABSL controller and CRD
  • Queuing Status mechanism
  • Watch NS filtering via list or labelselector
  • Update Pre-conditions
  • Make sure that multiple instances do not overlap for watch NS list
  • Control the spec exposed (on-going discussion)
  • Validating Webhooks
  • removes schedules and backup spec updates op
  • minor fixes and nit fixes

@shubham-pampattiwar shubham-pampattiwar marked this pull request as ready for review August 28, 2024 05:31
@shubham-pampattiwar
Copy link
Member Author

Need to add a note regarding tech preview: #47 (comment)

@shubham-pampattiwar shubham-pampattiwar self-assigned this Aug 28, 2024
# Non-Admin Backup/Restore Design

## Background
[OADP (Openshift API for Data Protection)](https://github.com/openshift/oadp-operator) Operator currently requires cluster admin access for performing Backup and Restore operations of applications deployed on the OpenShift platform.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here "has" would be better than "requires"?

suppose someone tries to use OADP in non admin way (without NAC). Installs it in a non admin namespace. the non admin user can only create things in that namespace, but it can tell OADP to backup/restore anything (in any namespace) in the cluster, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either is fine, currently we advertise OADP functionalities as cluster admin tasks so that way requires makes sense.

- View the status of the Restore created for the particular namespace
- Delete the Restore of the namespace

### BackupStorageLocation (BSL) Configuration Operation (optional use case)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention the use case of that BSL in the other operations such as Backup? How this will be used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the use-case

- The Data Protection Application (DPA) CR will consist of a root level struct called `nonAdmin` under the features. This struct will have a boolean to `enable` or `disable` the non-admin feature.
- If the `features.nonAdmin.enable` flag is set to `true`, the OADP Operator will install the NAC in OADP Operator's install namespace, by default this feature will be disabled.
- **Multiple OADP:NAC installations:** Due to performance limitations of single threaded nature of Velero controller, it might be desirable in some environments to have more than one instance of OADP:NAC installed.
- We need to ensure that one OADP operator instance is mapped to only one NAC instance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't clear to me. One instance of NAC for one OADP operator, and above we have the multiple OADP:NAC installations, so does it mean we can only scale the NAC by scaling OADP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OADP Operator is NS scoped operator, so you can have multiple OADP Operator installations in the cluster. But each of the OADP Operator will have one NAC controller instance if NAC is enabled. So we need to ensure that if there are multiple OADP/NAC installations in one particular cluster then no 2 NAC controllers should serve once single non-admin application/user namespace.

- If the `features.nonAdmin.enable` flag is set to `true`, the OADP Operator will install the NAC in OADP Operator's install namespace, by default this feature will be disabled.
- **Multiple OADP:NAC installations:** Due to performance limitations of single threaded nature of Velero controller, it might be desirable in some environments to have more than one instance of OADP:NAC installed.
- We need to ensure that one OADP operator instance is mapped to only one NAC instance
- Also, the namespaces catered by each NAC instance should be separate and not merge with each other
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not clear - NAC will create namespace per instance? Or it's OADP namespace to which the NAC controller(s) get installed, or if I think what it is:

The non-admin namespaces on which NAC controllers will operate....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes your understanding that The non-admin namespaces on which NAC controllers will operate.... is correct

## Pre-requisites
- **OADP installed**: OADP Operator must be installed and configured to use non-admin controller
- **Non Admin Controller configured**: Data Protection Application (DPA) instance must configure Non Admin Controller to watch user namespace(s), by default it watches all Namespaces
- **RBAC privileges for the user**: User must have the appropriate RBAC privileges to create Non Admin objects within the Namespace where Backup will be taken. An example of such ClusterRole, which may be added to the user with `RoleBinding`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user has admin rights to a particular namespace, the RBAC is not required right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah makes sense, added a note.

docs/design/Non_Admin_Controller_design.md Show resolved Hide resolved
docs/design/Non_Admin_Controller_design.md Show resolved Hide resolved
docs/design/Non_Admin_Controller_design.md Show resolved Hide resolved
Copy link
Collaborator

@mpryc mpryc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

It's really nice doc. If requires further updates we can address those in additional PR's as NAC design updates.

@openshift-ci openshift-ci bot added the lgtm label Oct 9, 2024
Copy link

openshift-ci bot commented Oct 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mateusoliveira43, mpryc, shubham-pampattiwar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [mateusoliveira43,mpryc,shubham-pampattiwar]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mateusoliveira43
Copy link
Contributor

/override "Continuous Integration / oadp-compatibility-check (pull_request)"

Copy link

openshift-ci bot commented Oct 9, 2024

@mateusoliveira43: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

  • Continuous Integration / oadp-compatibility-check (pull_request)

Only the following failed contexts/checkruns were expected:

  • ci/prow/images
  • license/snyk ( Hybrid Platforms - OpenShift API for Data Protection)
  • oadp-compatibility-check
  • pull-ci-migtools-oadp-non-admin-master-images
  • security/snyk ( Hybrid Platforms - OpenShift API for Data Protection)
  • tide

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to this:

/override "Continuous Integration / oadp-compatibility-check (pull_request)"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mateusoliveira43
Copy link
Contributor

/override oadp-compatibility-check

Copy link

openshift-ci bot commented Oct 9, 2024

@mateusoliveira43: Overrode contexts on behalf of mateusoliveira43: oadp-compatibility-check

In response to this:

/override oadp-compatibility-check

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot bot merged commit f99b25c into migtools:master Oct 9, 2024
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Merged / Ready for build
Development

Successfully merging this pull request may close these issues.

Design: OADP non-admin use cases Add OADP non-admin controller design
7 participants