-
Notifications
You must be signed in to change notification settings - Fork 1.5k
kep-3695-beta update #5346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kep-3695-beta update #5346
Conversation
Signed-off-by: Swati Gupta <[email protected]>
Welcome @guptaNswati! |
Hi @guptaNswati. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Co-authored-by: Kevin Klues <[email protected]>
/ok-to-test |
Signed-off-by: Swati Gupta <[email protected]>
@soltysh @SergeyKanzhelev what else is needed to get this merged. I see its set to |
Please update to the latest template according to #3695 (comment) My understanding @soltysh haven't stazrted reviews as the metadata was not correct. Now it should be correct, need to work on the document |
It is using the latest template. See Sorry, i am confused if you are referring to something else. |
You need to check the KEP layout of this PR matches the layout in the latest KEP template available. When folks update the KEP template some sections may move/be created/merged/removed, some new PRR questions can arise, and so forth |
As @ffromani mentioned some template sections might have moved or were added. I am relying on this comment: #3695 (comment) that tells the the KEP readme is not following the last template. |
Got it. thanks both. I see i updated the implementation history but that is within the format.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple questions not answered, which are requirement for beta promotion.
@@ -274,8 +274,9 @@ These cases will be added in the existing e2e tests: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier in the doc:
- Please make sure to check appropriate boxes in the
## Release Signoff Checklist
. - Missing links in the integration tests section, see template, and in the e2e section as well, see template. Either of the two is required for beta promotion, and it looks like you had a requirement for e2e during alpha, so I expect those to be completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please make sure to check appropriate boxes in the
## Release Signoff Checklist
.
This was addressed - thank you.
This one still holds. We need links for integration and e2e based on the template in the appropriate section. I believe e2es were added in kubernetes/kubernetes#116846 so you should be able to quickly fill those in. Not sure if there are others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still holds, I see Francesco mentioned several tests that were added, can we make sure they are explicitly linked in this document?
- [ ] No major bugs reported in the previous cycle. | ||
- [] Gather feedback from consumers of the DRA feature. | ||
- Integration with the NVIDIA DCGM exporter. | ||
- [x] No major bugs reported in the previous cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this holds.
- [ ] No major bugs reported in the previous cycle. | ||
- [] Gather feedback from consumers of the DRA feature. | ||
- Integration with the NVIDIA DCGM exporter. | ||
- [x] No major bugs reported in the previous cycle. | ||
|
||
#### GA | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further in the document:
- In
###### Are there any tests for feature enablement/disablement?
can you please link to those tests? - In
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
why that question is not applicable, can you explain? - In
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
N/A is not correct. You're using explicit metricspod_resources_endpoint_requests_total
,pod_resources_endpoint_requests_list
andpod_resources_endpoint_requests_get
. What is the SLO for those? - In
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
No is not correct. You're expanding the return types fromGet
call by additional information coming from DRA, so I'd expect that information be expressed in that answer. - In
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
your answer is not one I'd expect. What you're writing there is probably more suited to Troubleshooting or Risks and Mitigations sections. In your particular case I'm expecting a rough estimation if CPU/RAM/disk/IO will be affected by this feature. Based on my understanding you're likely increase the memory consumption to hold the DRA information, no? - You're missing
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
question, from the template - In
###### How does this feature react if the API server and/or etcd is unavailable?
I'm expecting a concrete answer. - In
###### What are other known failure modes?
can you please use the proposed template? IE. Detection -> Mitigations -> Diagnostic, etc. - In
###### What steps should be taken if SLOs are not being met to determine the problem?
you're missing answer, N/A is not acceptable. - What about Drawbacks and alternatives considered? I'm missing both sections, have we not considered any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh Most of these have the similar answers in this KEP and this KEP is an extension of it https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2043-pod-resource-concrete-assigments
Are these blocking updates?
And for this particular KEP, nothing has changed in the API from Alpha to Beta.
@ffromani where can i find the e2e test. The test PR is this: kubernetes/kubernetes#116846
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do see some test here https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-kubernetes-node-kubelet-serial-containerd/1932604229172596736
but that doesnot seem to be the right link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to review on which lane the podresources api (not pod-level resources) tests run, which changed multiple time over the months. Let's find and fix the configuration if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at the config in k/test-infra
the tests should indeed run in the lanes
pull-kubernetes-node-kubelet-serial-containerd, but it's high time we add an explicit lane and/or we call out the tests explicitly in some lane. We can do something similar to the somehow confusing pull-kubernetes-node-kubelet-serial-podresources
lane
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh Most of these have the similar answers in this KEP and this KEP is an extension of it https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2043-pod-resource-concrete-assigments
Are these blocking updates?
Yes, those are blocking changes. It doesn't matter the answers are similar, we need them in this document. If they are matching, it's fine to just copy them over.
Signed-off-by: Swati Gupta <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sig-node review: provisional LGTM. Two outstanding items from my POV.
- (nonblocking, in scope for Beta, "just" extra work) turns out the CI config is obsolete, and the podresources api tests run incidentally. We should add them explicitly (like we do for example for cpumanager) and we should have a presubmit lane (like we do for example for pod level resources). I think it is unfair to account all this work into this KEP, but some of it make sense to be done in the context of the beta graduation. Already begun, PRs linked. We may need to fix or add some e2e tests, but I think this is totally fair in the context of the beta graduation.
- (blocking) clarify the
Get
semantics as discussed. Because of how the API works, there is a TOCTOU unavoidable issue. There is no atomicity or consistency guarantee betweenList
orGet
, nor a concept of a session (which I would push back, btw). So it is possible, albeit probably unlikely, thatGet
is called with stale information (e.g pod becomes terminated between calls). We need to clarify and document the error paths
Once the above are addressed, LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of my previous comments still hold, and are blocking to merging this KEP.
@@ -274,8 +274,9 @@ These cases will be added in the existing e2e tests: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please make sure to check appropriate boxes in the
## Release Signoff Checklist
.
This was addressed - thank you.
This one still holds. We need links for integration and e2e based on the template in the appropriate section. I believe e2es were added in kubernetes/kubernetes#116846 so you should be able to quickly fill those in. Not sure if there are others.
@@ -333,7 +334,7 @@ The API becomes available again. The API is stateless, so no recovery is needed, | |||
|
|||
###### Are there any tests for feature enablement/disablement? | |||
|
|||
e2e test will demonstrate that when the feature gate is disabled, the API returns the appropriate error code. | |||
e2e test will demonstrate that when the feature gate is disabled, the API returns the appropriate error code. (https://github.com/kubernetes/kubernetes/pull/116846) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linked PR isn't testing feature enablement/disablement, or am I misreading it? The closest place where you test this feature gate is https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/podresources/server_v1_test.go but there you only turn this on, but I don't see the requested on/off test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a on/off test scattered across the existing tests: https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.1/test/e2e_node/podresources_test.go#L977 and https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.1/test/e2e_node/podresources_test.go#L1066
We can use a PR to make the tests more explicit and some changes are needed if the FG goes to default on: the FG status should be set explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, can we make sure this is listed here?
- [ ] No major bugs reported in the previous cycle. | ||
- [] Gather feedback from consumers of the DRA feature. | ||
- Integration with the NVIDIA DCGM exporter. | ||
- [x] No major bugs reported in the previous cycle. | ||
|
||
#### GA | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh Most of these have the similar answers in this KEP and this KEP is an extension of it https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2043-pod-resource-concrete-assigments
Are these blocking updates?
Yes, those are blocking changes. It doesn't matter the answers are similar, we need them in this document. If they are matching, it's fine to just copy them over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, we need one more update please
To fix a long-standing desync where `List()` would return allocations for terminated pods because it iterated over the full kubelet pod list instead of the resource-manager’s active pods, PR #132028 adds a new `activePodsOnly` boolean flag to the v1 `List` RPC. | ||
|
||
- **Default behavior** (`activePodsOnly=false`): backward-compatible, returns all pods as before. | ||
- **With `activePodsOnly=true`**: filters results to only those pods currently tracked as “active” by the resource managers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would omit this part, is not relevant for this KEP. I will submit a separate update to both the KEP and the docs later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay. I was also not sure if this is the right place. But still wanted to add a bit of context of Get() is based on List()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once this and the comment right below are addressed (basically once this section is rephrased), LGTM
note that even the current most common usage pattern is List
+ a series of call to Get
, the two calls are independent from each other. Is perfectly legal for a client to call Get
with a namespace+name pair taken in a different way than the output of List
.
**Race and inconsistency between `List()` and `Get()`:** | ||
A pod’s phase may change between a successful `List()` and a subsequent `Get()`. Because there is no session or atomic snapshot between a `List()` and a later `Get()`, it’s possible (though uncommon) for a pod to transition to a non-Running phase in between calls. Clients must therefore be prepared for “stale” or “missing” data and handle the documented error paths when `Get()` is invoked on a pod that is no longer in the Running phase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to fully specify the behavior of Get
in this case, or if, because a mistake or a race or any issue, it is fed with inconsistent data.
Currently the code does this:
pod, exist := p.podsProvider.GetPodByName(req.PodNamespace, req.PodName)
if !exist {
metrics.PodResourcesEndpointErrorsGetCount.WithLabelValues("v1").Inc()
return nil, fmt.Errorf("pod %s in namespace %s not found", req.PodName, req.PodNamespace)
}
It seems to me the most logical extension is to take into account the phase of the pod either
- return empty data and swallow the error
- return error like pod not found
The API in general (we determined during the fixing and review process of 132028) wants to return data pertaining to running pods which thus actively hold resources.
Therefore, my take is that Get
should also error out in the unlikely but possible case it is asked about non-running pods.
I'm open to the other approach or to any other sufficiently motivated approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ffromani
How about this:
Get()
returns a valid GetPodResourcesResponse
with its current PodResources
when the pod is in Running
state. But due to some race condition or other inconsistency, it’s possible (though uncommon) for a specified pod to transition to terminated or its Status.Phase
is not Running
, the call must return a NotFound
error.
Do you think, adding an explicit check about the Pod Status.Phase can help? Something like:
pod, exist := p.podsProvider.GetPodByName(req.PodNamespace, req.PodName)
if !exist || pod.Status.Phase != v1.PodRunning {
metrics.PodResourcesEndpointErrorsGetCount.WithLabelValues("v1").Inc()
return nil, status.Errorf(codes.NotFound,
"pod %s in namespace %s not found or not running",
req.PodName, req.PodNamespace)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we may be overthinking this. As you pointed out, its possible that Get() mostly (not 100%) returns an error when asked about a non running pod. I think just calling it out in the spec should be enough. Digging bit deeper in the code, Get() should not report a removed pod except a rare race condition https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pod/pod_manager.go#L198
Your fix in List() to filterOutInactivePods() make sense https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L1065
If you still think, its better to improve Get(), we can similar check like I proposed above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I surely don't want to overcomplicate. The problem here is the insufficient validation and lack of checking, albeit in a corner case. TL;DR: we need to decide what Get
does when the phase is not Running
. A one-liner is sufficient for me, but it has to be there :)
e.g.
"Get() never fails if the kubelet knows about the pods" (but note this likely misreports resources of terminated pods)
"Get() returns error if the pod is known to the kubelet, but is terminated"
that's it.
@soltysh PTAL, i tried to answers these #5346 (comment). If these are satisfactory, then the only pending items:
|
Signed-off-by: Swati Gupta <[email protected]>
Signed-off-by: Swati Gupta <[email protected]>
a804a04
to
ae728ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
sig-node review: my comment about Get
was addressed. We may need to review for the nect cycle (beta2? GA?), but the current changes are sufficient.
my 2c about e2e tests (k/k/test/e2e_node/podresources_test.go):
My PR kubernetes/kubernetes#132028 does a bit of cleanup in the e2e tests so it can be wise to wait for it to merge (it's a fix we need anyway) and base the work on top of it Fixing the test-infra lane to better handle the e2e tests is already ongoing: kubernetes/kubernetes#132345 and can proceed in parallel My understanding is that doing all the above is fair in the same cycle on which we attempt to graduate to beta. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still answers missing inside the document itself, but since they were provided as comments, I'm willing to conditionally approve as is. Please make sure to either add the missing links before merging this PR, or shortly after.
/approve
the PRR section
- [ ] No major bugs reported in the previous cycle. | ||
- [] Gather feedback from consumers of the DRA feature. | ||
- Integration with the NVIDIA DCGM exporter. | ||
- [x] No major bugs reported in the previous cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this clarifies this a bit more.
@@ -333,7 +334,7 @@ The API becomes available again. The API is stateless, so no recovery is needed, | |||
|
|||
###### Are there any tests for feature enablement/disablement? | |||
|
|||
e2e test will demonstrate that when the feature gate is disabled, the API returns the appropriate error code. | |||
e2e test will demonstrate that when the feature gate is disabled, the API returns the appropriate error code. (https://github.com/kubernetes/kubernetes/pull/116846) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, can we make sure this is listed here?
@@ -274,8 +274,9 @@ These cases will be added in the existing e2e tests: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still holds, I see Francesco mentioned several tests that were added, can we make sure they are explicitly linked in this document?
Thanks @soltysh. Given the holiday in the US today I don't think @guptaNswati will have time to make the update before tomorrow. I will work with her early next week to ensure all of the information in the comments gets incorporated. @mrunalp, @haircommander, @SergeyKanzhelev can we please get a sig-node approval to move this forward. Thanks! |
I think they are all based in the US. I'd be supporting (FWIW) an exception if the KEP slips just because there is no time left for an approval review because of the holidays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving based on @klueska review.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: guptaNswati, mrunalp, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
One-line PR description: Update KEP to prepare for beta in 1.34
Issue link: DRA: Extend PodResources to include resources from Dynamic Resource Allocation #3695
/wg device-management
/assign @moshe010
/assign @klueska
/assign @@johnbelamaric for PRR