-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4753: Expose ownerReferences via downward API #4754
base: master
Are you sure you want to change the base?
Conversation
/cc @thockin |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ArangoGutierrez The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
6ae4faf
to
b57c505
Compare
|
||
## Motivation | ||
|
||
Deployments and ReplicaSets can orphan and adopt pods dinamically. Any additional information we provide can change over time. On day 1 a pod named "foobar-deadbeef93-xyz76" is owned by replicaset "foobar-deadbeef93" which is owned by deployment "foobar", but after a node restart, the pod name could change to "foobar-deadbeef93-abc12", leaving the object created by "foobar-deadbeef93-xyz76" orphan, and triggering unwanted behavior. Enabling the pod to pass its ownerReferences to the new object it manages will prevent objects from being orphaned by pods that are owned by a higher level object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid this motivation doesn't make much sense to me. I think you've got some context in your head that is not in the text.
Why does a pod need to know it's own ownerRefs ? You seem to be implying that the pod will create some other object (e.g. a ConfigMap) and rather than parent that object itself, it wants to create the ConfigMap and set its ownerRef to the pod's ownerRef? Why?
It seems plausible to me that a pod would be able to know its own ownerRefs, but that use-case eludes me.
Help me understand what a pod will actually DO with this information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @thockin
b57c505
to
bad8d3d
Compare
bad8d3d
to
c904f3b
Compare
|
||
## Motivation | ||
|
||
DaemonSets and ReplicaSets (via Deployments) can dynamically orphan and adopt pods. Consider a scenario where, on day 1, a pod named foobar-deadbeef93-xyz76 is owned by the ReplicaSet foobar-deadbeef93, which in turn is managed by the Deployment foobar. After a node restart, the pod might be replaced by foobar-deadbeef93-abc12, orphaning the original pod foobar-deadbeef93-xyz76 and potentially causing unwanted behavior. This issue arises when pods create resources like ConfigMaps or CustomResources and these resources are not correctly reassigned to the new pod, leading to orphaned objects. Ensuring that pods can pass their ownerReferences to new objects they manage would prevent this issue, as the ownership hierarchy would be preserved even when pods are recreated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a node restart, the pod might be replaced by foobar-deadbeef93-abc12, orphaning the original pod foobar-deadbeef93-xyz76 and potentially causing unwanted behavior
can this really happen? it this not a bug on the replicaset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is confusing because DaemonSet and Deployment are very different - a pod's parent is never a Deployment. I would retool this to JUST talk about DaemonSet
If I am a daemonset pod "foobar-x" on node "x", and I create a ConfigMap that is associated with node "x", then it would be incorrect to set the ownerRef
to "foobar-x" - I should set it to the daemonset "foobar". If you set it to the pod, then you can race with garbage-collection.
For a ReplicaSet it's bad because Deployments can replace ReplicaSets, so the proper parent should be the Deployment, which is 2 levels up and is NOT covered by this KEP, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, and the motivation that inspired me to propose this KEP is based on a DaemonSet use case, I have updated the text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the text now reflect this conversation, pointing only to Daemonset
uid: 6be1683f-da9c-4f68-9440-82376231cfa6 | ||
``` | ||
|
||
If `ownerReferences` is empty or not set, the file will be empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happen if ownerReferences
is updated? is it reflected in the Pod or the Pod keeps using the original value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this were to happen, isn't the Daemonset responsibility to refresh all the owned pods? and during this update cycle the injected file would be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know very little about sig apps side of it. But is there any mutable fields on replicaset or DaemonSet that would not change the pod spec? Change of those will not be reflected in the downward API object.
Whenever we have any mutable value we lock in time, we like to add something in status to indicate which version of that value we have written down.
Maybe I misunderstood @aojea 's question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also behavior on Container restart would be interesting. Whether we lock the value in time of Pod creation or we re-populate on every Container restart
|
||
e2e testing will consist of the following: | ||
- Create a Deployment and or DaemonSet | ||
- Verify if the pod created by the Deployment/DaemonSet has the ownerReferences of the Deployment/DaemonSet via ENV VAR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this correct? you said before env variables are out of scope
Is out of the scopt of this KEP to add the
ownerReferences
to the pod's environment variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my bad fixed.
- Feature gate name: `DownwardAPIOwnerReferences` | ||
- Components depending on the feature gate: `kubelet` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cannot this be a new field in kubelet configuration?
If the feature (implementation) turns out to be not appropriate, users can simply update the kubelet configuration and restart kubelet
. However, if it is a feature gate, users don't know which components are affected. We don't have documentations elsewhere stating that this gate only affects kubelet
. The best bet for them would be to restart all kube-xxx components because, this gate (as with others) will show up in kube-apiserver --help
anyway when landed.
We have witnessed a lot of "abuse" of feature gates. Can we think twice about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feature gate is a mechanism to introduce new changes to mitigate the risks of breaking existing clusters, they are not meant to be used as flags, once they graduate they disappear and the functionality can not be disabled ... are you implying this KEP should add the functionality as opt-in or opt-out via configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I know what feature gate was designed for ... My question is that do we want all new changes to be introduced with a feature gate? All changes could break existing clusters. I raised this concern because we are seeing that quite some new feature gates introduced lately are about trivial changes. e.g. new optimizations to the watch cache, adding new fields to the status of a resource, adding new entry to resource annotation ...
Maybe this one is not so trivial, maybe it actually warrants a new feature gate that takes several releases to become GA. My point is about the user-facing changes. For most cluster admins, feature gates are currently applied system wide, across all components. That is a HEAVY burden. I hope our team will consider this factor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want all new changes to be introduced with a feature gate?
If you think your change could break a cluster, then YES.
If you your change has any aspect of API, then YES.
new optimizations to the watch cache
Could absolutely break a cluster!
adding new fields to the status of a resource
Any API field we add goes through a gate so that we can safely do a rollback without breaking or dropping information.
adding new entry to resource annotation
If this produces a value which version x-1 of Kubernetes would reject, then it needs a gate. Always think about this case:
User upgrades cluster from x->y
User uses your new API field.
Uh oh! Something bad happened.
User rolls cluster back to x.
Your API field is still stored in etcd, and may have some lingering external effect - what will happen? What if the user tries to update the object - will it fail validation? Or will the new field just disappear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we agree that this change is ok to be done via feature gates? can we mark this comment resolved?
c904f3b
to
f51a0a1
Compare
## Summary | ||
|
||
Today when a pod wants to pass its `ownerReferences` to a new object it manage (a ConfigMap, a Secret, etc), it needs to do it call the API server to get it's own ownerReferences and then pass it to the new object. | ||
The pod then needs GET access on all the intermediate resources which it may not have, and giving it that access is a security risk as it can access other resources it should not have access to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pod only needs GET access on pods, not anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this line tries to say is that TODAY, for a pod to know its owner, it needs to do a GET call to the API Server. getting all the info that perhaps we don't want the pod to know. And only GET bc today the pod can get the namespace and name via downwardAPI, so with those 2 fields you can do a targeted GET.
|
||
## Motivation | ||
|
||
DaemonSets and ReplicaSets (via Deployments) can dynamically orphan and adopt pods. Consider a scenario where, on day 1, a pod named foobar-deadbeef93-xyz76 is owned by the ReplicaSet foobar-deadbeef93, which in turn is managed by the Deployment foobar. After a node restart, the pod might be replaced by foobar-deadbeef93-abc12, orphaning the original pod foobar-deadbeef93-xyz76 and potentially causing unwanted behavior. This issue arises when pods create resources like ConfigMaps or CustomResources and these resources are not correctly reassigned to the new pod, leading to orphaned objects. Ensuring that pods can pass their ownerReferences to new objects they manage would prevent this issue, as the ownership hierarchy would be preserved even when pods are recreated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is confusing because DaemonSet and Deployment are very different - a pod's parent is never a Deployment. I would retool this to JUST talk about DaemonSet
If I am a daemonset pod "foobar-x" on node "x", and I create a ConfigMap that is associated with node "x", then it would be incorrect to set the ownerRef
to "foobar-x" - I should set it to the daemonset "foobar". If you set it to the pod, then you can race with garbage-collection.
For a ReplicaSet it's bad because Deployments can replace ReplicaSets, so the proper parent should be the Deployment, which is 2 levels up and is NOT covered by this KEP, right?
|
||
DaemonSets and ReplicaSets (via Deployments) can dynamically orphan and adopt pods. Consider a scenario where, on day 1, a pod named foobar-deadbeef93-xyz76 is owned by the ReplicaSet foobar-deadbeef93, which in turn is managed by the Deployment foobar. After a node restart, the pod might be replaced by foobar-deadbeef93-abc12, orphaning the original pod foobar-deadbeef93-xyz76 and potentially causing unwanted behavior. This issue arises when pods create resources like ConfigMaps or CustomResources and these resources are not correctly reassigned to the new pod, leading to orphaned objects. Ensuring that pods can pass their ownerReferences to new objects they manage would prevent this issue, as the ownership hierarchy would be preserved even when pods are recreated. | ||
|
||
For example, a CustomResource created by a pod managed by a DaemonSet may be unexpectedly garbage collected if the pod is deleted. This can disrupt system behavior, as the necessary resource is lost. By allowing the pod to inherit and pass down ownerReferences, the CustomResource would remain correctly managed by the DaemonSet, avoiding such disruptions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a cause-effect implied here that I don't understand. I think you mean to say that now you have enough information to set the owner of the CR to the DS itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rephrased this line, does it convey the intention now?
### Risks and Mitigations | ||
|
||
- Security Risk: Some environments may not want to expose the ownerReferences of a pod to the pod itself. | ||
- Mitigation: This feature could be added as a feature gate, disabled by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a mitigation to this KEP. If there is some cluster wherein the admin DOES NOT want pods to be able to know their owners, the mitigation would be to install an admission check (webhook or Validating Admission Policy) to reject pods that try to use this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have edited the Mitigation entry, thanks @thockin
Logs from the pod will show the ownerReferences of the pod: | ||
|
||
```bash | ||
$ kubectl logs kubernetes-downwardapi-volume-example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is under-specified.
I don't know if we should dump the WHOLE object, but it's probably fine - can you think of any reason to hide any of these?
YAML is a terrible file format for parsing - I think it would be better to spec this as something like:
"""
The file representation of this fieldref is a JSON list of objects, each of which is a serialized k8s.io/apimachinery/pkg/apis/meta/v1.OwnerReference
.
"""
Open question: Should we instead produce a metav1.List? e.g.
{
"kind": "OwnerReference",
"apiVersion": "meta/v1",
"items": [
{ .... }, { ... }
]
}
IF there was ever a meta/v2, we would need to allow people to request that - does the output need to be self-documenting? I think probably not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we should dump the WHOLE object, but it's probably fine - can you think of any reason to hide any of these?
I think if we go for your JSON recommendation and only pass a list is more than ok, all we need is the ownerRef to pass it to objects that the Pod will create.
So to your open question, I like the metav1.List
idea.
|
||
If `ownerReferences` is empty or not set, the file will be empty. | ||
|
||
Is out of the scope of this KEP to add the `ownerReferences` to the pod's environment variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? It's always a smell when we do one but not the other. This value could be specced the same as the file - a JSON list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with JSON is possible, agree.
- Feature gate name: `DownwardAPIOwnerReferences` | ||
- Components depending on the feature gate: `kubelet` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want all new changes to be introduced with a feature gate?
If you think your change could break a cluster, then YES.
If you your change has any aspect of API, then YES.
new optimizations to the watch cache
Could absolutely break a cluster!
adding new fields to the status of a resource
Any API field we add goes through a gate so that we can safely do a rollback without breaking or dropping information.
adding new entry to resource annotation
If this produces a value which version x-1 of Kubernetes would reject, then it needs a gate. Always think about this case:
User upgrades cluster from x->y
User uses your new API field.
Uh oh! Something bad happened.
User rolls cluster back to x.
Your API field is still stored in etcd, and may have some lingering external effect - what will happen? What if the user tries to update the object - will it fail validation? Or will the new field just disappear?
|
||
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? | ||
|
||
No, the missing Downward API field will be perceived as a missing Volume preventing the pod from starting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, wouldn't it be seen as an unknown field ref?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right. fixed
df53a9f
to
6ed9571
Compare
Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
6ed9571
to
4a19e53
Compare
We tried to reach you to figure out if you were going to work on this for 1.32 (#4753 (comment)). We did not hear from you so we pulled this KEP from consideration for 1.32. We did not allocate a sig-node approver for this so it may be difficult to get sig-node approvers in the next two days. |
Understood, thanks for the heads up. |
Let's make a point to get together and discuss this in real-time EARLY in the next cycle |
@thockin there is no reason why we can't review now and get it ready for the next cycle.. |
|
||
For example, a CustomResource created by a pod managed by a DaemonSet may be unexpectedly garbage collected if the pod is deleted. This can disrupt system behavior, as the CustomResource is garbageCollected. By allowing the pod to inherit and pass down ownerReferences, the CustomResource would remain correctly managed by the DaemonSet(the owner), avoiding disruptions. | ||
|
||
### Goals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we only need one level of ownerReference? Would it ever be needed to know the owner of the owner of the owner? I cannot tell from the motivation section whether one level will always be enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I don't have a strong use case for that. If you have a use case maybe we can discuss it and integrate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one option is to just understand how it can be extended to the owner of an owner in future. To make sure we are not making it unnatural to be extended this way in future
One-line PR description: Adding a new KEP
Issue link: KEP-4753: Expose
ownerReferences
viavalueFrom
and downward API #4753