Skip to content

KEP-5018: move to beta in 1.34 #5327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ritazh
Copy link
Member

@ritazh ritazh commented May 21, 2025

  • One-line PR description: Update KEP to prepare for beta in 1.34

/wg device-management
/assign @liggitt for sig auth
/assign @pohly
/assign @soltysh for PRR

Signed-off-by: Rita Zhang <[email protected]>
@k8s-ci-robot k8s-ci-robot added the wg/device-management Categorizes an issue or PR as relevant to WG Device Management. label May 21, 2025
@k8s-ci-robot
Copy link
Contributor

@ritazh: GitHub didn't allow me to assign the following users: for, sig, auth, PRR.

Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

  • One-line PR description: Update KEP to prepare for beta in 1.34

/wg device-management
/assign @liggitt for sig auth
/assign @pohly
/assign @soltysh for PRR

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label May 21, 2025
@k8s-ci-robot k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label May 21, 2025
@k8s-ci-robot k8s-ci-robot requested a review from micahhausler May 21, 2025 15:49
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 21, 2025
Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update lgtm, just had a couple questions

@@ -466,7 +466,7 @@ ResourceClaimTemplate and ResourceClaim for admin access
- Gather feedback
- Additional tests are in Testgrid and linked in KEP
- Implementations in the kubernetes-sigs/dra-example-driver
- Implementations in the kubernetes-sigs/dra-example-driver: https://github.com/kubernetes-sigs/dra-example-driver/issues/97 and the NVIDIA dra driver: https://github.com/NVIDIA/k8s-dra-driver-gpu/issues/337
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do those issues mean we will show those repos labeling namespaces as admin access and using devices as admin access before promoting the gate to beta?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Implementations in the kubernetes-sigs/dra-example-driver was part of the original beta criteria. I think we should be able to add an example there. I'm less certain about the exact timeline of the Nvidia one. I could remove that one for now and add it back AFTER it's done. wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not take dependencies on consumers we're not sure will be ready as a beta graduation criteria ... one example use seems sufficient

@@ -541,7 +541,12 @@ rollout. Similarly, consider large clusters and how enablement/disablement
will rollout across nodes.
-->

Will be considered for beta.
- kube-controller-manager: If the kube-controller-manager fails to create `ResourceClaim` objects from `ResourceClaimTemplate` due to misconfigurations or permission issues relating to `adminAccess`, then the associated Pods will remain in a pending state and won't be scheduled.
- kube-scheduler: Bugs in the scheduler might lead to Pods not being scheduled even when resources are available or, scheduling Pods that shouldn't be scheduled due to unmet `adminAccess` requirements. If the `DRAAdminAccess` feature gate isn't enabled or is misconfigured, the scheduler might not recognize ResourceClaim requirements, leading to scheduling failures.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this thinking of something more than generic scheduler backoff behavior when it encounters failed API requests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this should be part of the generic scheduler backoff behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, maybe clarify that... otherwise this line sounds scarier or more specific to this feature than it actually is

@@ -596,7 +603,9 @@ checking if there are objects with field X set) may be a last resort. Avoid
logs or events for this purpose.
-->

Will be considered for beta.
".status.allocation.devices.results[*].adminaccess" will be set to true for a claim using adminAccess when needed by a pod.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
".status.allocation.devices.results[*].adminaccess" will be set to true for a claim using adminAccess when needed by a pod.
".status.allocation.devices.results[*].adminAccess" will be set to true for a claim using adminAccess when needed by a pod.

@@ -705,7 +717,8 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
- Impact of its degraded performance or high-error rates on the feature:
-->

Will be considered for beta.
- The DynamicResourceAllocation feature gate must be enabled to create ResourceClaim, ResourceClaimTemplate. More details at [KEP-4381 - DRA Structured Parameters](https://github.com/kubernetes/enhancements/issues/4381)
- A third-party DRA driver is required for how the driver should interpret the AdminAcess field to get acess to device specific resources without allocating them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- A third-party DRA driver is required for how the driver should interpret the AdminAcess field to get acess to device specific resources without allocating them.
- A third-party DRA driver is required for how the driver should interpret the AdminAcess field to get access to device specific resources without allocating them.

Signed-off-by: Rita Zhang <[email protected]>
@liggitt
Copy link
Member

liggitt commented May 21, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 21, 2025
@enj enj added this to SIG Auth May 22, 2025
@enj enj moved this to Needs Triage in SIG Auth May 22, 2025
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly the integration links are missing, otherwise it's good to go.

@@ -466,7 +466,7 @@ ResourceClaimTemplate and ResourceClaim for admin access
- Gather feedback
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing bits higher in the doc:

  1. make sure to check appropriate boxes in Release Singoff Checklist
  2. In Integration tests section, please make sure to link tests according to the template, especially the newly added that are called out there, since looking at the PRs submitted during alpha they did add new tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments still hold.

and ResourceClaim.

- Mitigations: When ResourceClaims or ResourceClaimTemplates the `AdminAccess`
field don't get created, debugging should focus on the namespace labels. The kube-controller-manager logs should have more information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
field don't get created, debugging should focus on the namespace labels. The kube-controller-manager logs should have more information.
field doesn't get created, debugging should focus on the namespace labels. The kube-controller-manager logs should have more information.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels May 22, 2025
Signed-off-by: Rita Zhang <[email protected]>
@liggitt
Copy link
Member

liggitt commented May 22, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 22, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: liggitt, ritazh
Once this PR has been reviewed and has the lgtm label, please ask for approval from soltysh. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@liggitt
Copy link
Member

liggitt commented May 22, 2025

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 22, 2025
```
Note: This label has been updated from `resource.k8s.io/admin-access` while the feature was in alpha in v1.33.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite accurate, b/c 1.33 is still using resource.k8s.io/admin-access: "true", so maybe

Suggested change
Note: This label has been updated from `resource.k8s.io/admin-access` while the feature was in alpha in v1.33.
Note: This label has been updated from `resource.k8s.io/admin-access` before the beta promotion.

or

Suggested change
Note: This label has been updated from `resource.k8s.io/admin-access` while the feature was in alpha in v1.33.
Note: This label has been updated from `resource.k8s.io/admin-access` while the feature was in alpha.

Ideally I'd say open a PR doing so asap.

@@ -466,7 +466,7 @@ ResourceClaimTemplate and ResourceClaim for admin access
- Gather feedback
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments still hold.

@pohly pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation May 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.
Projects
Status: 👀 In review
Status: Needs Triage
Development

Successfully merging this pull request may close these issues.

5 participants