Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4753: Expose ownerReferences via downward API #4754

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ArangoGutierrez
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 9, 2024
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 9, 2024
@ArangoGutierrez
Copy link
Contributor Author

/cc @thockin

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ArangoGutierrez
Once this PR has been reviewed and has the lgtm label, please assign mrunalp, wojtek-t for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ArangoGutierrez ArangoGutierrez force-pushed the KEP-4753 branch 3 times, most recently from 6ae4faf to b57c505 Compare July 9, 2024 17:11

## Motivation

Deployments and ReplicaSets can orphan and adopt pods dinamically. Any additional information we provide can change over time. On day 1 a pod named "foobar-deadbeef93-xyz76" is owned by replicaset "foobar-deadbeef93" which is owned by deployment "foobar", but after a node restart, the pod name could change to "foobar-deadbeef93-abc12", leaving the object created by "foobar-deadbeef93-xyz76" orphan, and triggering unwanted behavior. Enabling the pod to pass its ownerReferences to the new object it manages will prevent objects from being orphaned by pods that are owned by a higher level object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this motivation doesn't make much sense to me. I think you've got some context in your head that is not in the text.

Why does a pod need to know it's own ownerRefs ? You seem to be implying that the pod will create some other object (e.g. a ConfigMap) and rather than parent that object itself, it wants to create the ConfigMap and set its ownerRef to the pod's ownerRef? Why?

It seems plausible to me that a pod would be able to know its own ownerRefs, but that use-case eludes me.

Help me understand what a pod will actually DO with this information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @thockin


## Motivation

DaemonSets and ReplicaSets (via Deployments) can dynamically orphan and adopt pods. Consider a scenario where, on day 1, a pod named foobar-deadbeef93-xyz76 is owned by the ReplicaSet foobar-deadbeef93, which in turn is managed by the Deployment foobar. After a node restart, the pod might be replaced by foobar-deadbeef93-abc12, orphaning the original pod foobar-deadbeef93-xyz76 and potentially causing unwanted behavior. This issue arises when pods create resources like ConfigMaps or CustomResources and these resources are not correctly reassigned to the new pod, leading to orphaned objects. Ensuring that pods can pass their ownerReferences to new objects they manage would prevent this issue, as the ownership hierarchy would be preserved even when pods are recreated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a node restart, the pod might be replaced by foobar-deadbeef93-abc12, orphaning the original pod foobar-deadbeef93-xyz76 and potentially causing unwanted behavior

can this really happen? it this not a bug on the replicaset?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is confusing because DaemonSet and Deployment are very different - a pod's parent is never a Deployment. I would retool this to JUST talk about DaemonSet

If I am a daemonset pod "foobar-x" on node "x", and I create a ConfigMap that is associated with node "x", then it would be incorrect to set the ownerRef to "foobar-x" - I should set it to the daemonset "foobar". If you set it to the pod, then you can race with garbage-collection.

For a ReplicaSet it's bad because Deployments can replace ReplicaSets, so the proper parent should be the Deployment, which is 2 levels up and is NOT covered by this KEP, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, and the motivation that inspired me to propose this KEP is based on a DaemonSet use case, I have updated the text

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the text now reflect this conversation, pointing only to Daemonset

uid: 6be1683f-da9c-4f68-9440-82376231cfa6
```

If `ownerReferences` is empty or not set, the file will be empty.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happen if ownerReferences is updated? is it reflected in the Pod or the Pod keeps using the original value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this were to happen, isn't the Daemonset responsibility to refresh all the owned pods? and during this update cycle the injected file would be updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know very little about sig apps side of it. But is there any mutable fields on replicaset or DaemonSet that would not change the pod spec? Change of those will not be reflected in the downward API object.

Whenever we have any mutable value we lock in time, we like to add something in status to indicate which version of that value we have written down.

Maybe I misunderstood @aojea 's question.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also behavior on Container restart would be interesting. Whether we lock the value in time of Pod creation or we re-populate on every Container restart


e2e testing will consist of the following:
- Create a Deployment and or DaemonSet
- Verify if the pod created by the Deployment/DaemonSet has the ownerReferences of the Deployment/DaemonSet via ENV VAR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this correct? you said before env variables are out of scope

Is out of the scopt of this KEP to add the ownerReferences to the pod's environment variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad fixed.

Comment on lines +375 to +419
- Feature gate name: `DownwardAPIOwnerReferences`
- Components depending on the feature gate: `kubelet`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cannot this be a new field in kubelet configuration?
If the feature (implementation) turns out to be not appropriate, users can simply update the kubelet configuration and restart kubelet. However, if it is a feature gate, users don't know which components are affected. We don't have documentations elsewhere stating that this gate only affects kubelet. The best bet for them would be to restart all kube-xxx components because, this gate (as with others) will show up in kube-apiserver --help anyway when landed.

We have witnessed a lot of "abuse" of feature gates. Can we think twice about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature gate is a mechanism to introduce new changes to mitigate the risks of breaking existing clusters, they are not meant to be used as flags, once they graduate they disappear and the functionality can not be disabled ... are you implying this KEP should add the functionality as opt-in or opt-out via configuration?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I know what feature gate was designed for ... My question is that do we want all new changes to be introduced with a feature gate? All changes could break existing clusters. I raised this concern because we are seeing that quite some new feature gates introduced lately are about trivial changes. e.g. new optimizations to the watch cache, adding new fields to the status of a resource, adding new entry to resource annotation ...

Maybe this one is not so trivial, maybe it actually warrants a new feature gate that takes several releases to become GA. My point is about the user-facing changes. For most cluster admins, feature gates are currently applied system wide, across all components. That is a HEAVY burden. I hope our team will consider this factor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want all new changes to be introduced with a feature gate?

If you think your change could break a cluster, then YES.

If you your change has any aspect of API, then YES.

new optimizations to the watch cache

Could absolutely break a cluster!

adding new fields to the status of a resource

Any API field we add goes through a gate so that we can safely do a rollback without breaking or dropping information.

adding new entry to resource annotation

If this produces a value which version x-1 of Kubernetes would reject, then it needs a gate. Always think about this case:

User upgrades cluster from x->y
User uses your new API field.
Uh oh! Something bad happened.
User rolls cluster back to x.

Your API field is still stored in etcd, and may have some lingering external effect - what will happen? What if the user tries to update the object - will it fail validation? Or will the new field just disappear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we agree that this change is ok to be done via feature gates? can we mark this comment resolved?

## Summary

Today when a pod wants to pass its `ownerReferences` to a new object it manage (a ConfigMap, a Secret, etc), it needs to do it call the API server to get it's own ownerReferences and then pass it to the new object.
The pod then needs GET access on all the intermediate resources which it may not have, and giving it that access is a security risk as it can access other resources it should not have access to.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pod only needs GET access on pods, not anything else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this line tries to say is that TODAY, for a pod to know its owner, it needs to do a GET call to the API Server. getting all the info that perhaps we don't want the pod to know. And only GET bc today the pod can get the namespace and name via downwardAPI, so with those 2 fields you can do a targeted GET.


## Motivation

DaemonSets and ReplicaSets (via Deployments) can dynamically orphan and adopt pods. Consider a scenario where, on day 1, a pod named foobar-deadbeef93-xyz76 is owned by the ReplicaSet foobar-deadbeef93, which in turn is managed by the Deployment foobar. After a node restart, the pod might be replaced by foobar-deadbeef93-abc12, orphaning the original pod foobar-deadbeef93-xyz76 and potentially causing unwanted behavior. This issue arises when pods create resources like ConfigMaps or CustomResources and these resources are not correctly reassigned to the new pod, leading to orphaned objects. Ensuring that pods can pass their ownerReferences to new objects they manage would prevent this issue, as the ownership hierarchy would be preserved even when pods are recreated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is confusing because DaemonSet and Deployment are very different - a pod's parent is never a Deployment. I would retool this to JUST talk about DaemonSet

If I am a daemonset pod "foobar-x" on node "x", and I create a ConfigMap that is associated with node "x", then it would be incorrect to set the ownerRef to "foobar-x" - I should set it to the daemonset "foobar". If you set it to the pod, then you can race with garbage-collection.

For a ReplicaSet it's bad because Deployments can replace ReplicaSets, so the proper parent should be the Deployment, which is 2 levels up and is NOT covered by this KEP, right?


DaemonSets and ReplicaSets (via Deployments) can dynamically orphan and adopt pods. Consider a scenario where, on day 1, a pod named foobar-deadbeef93-xyz76 is owned by the ReplicaSet foobar-deadbeef93, which in turn is managed by the Deployment foobar. After a node restart, the pod might be replaced by foobar-deadbeef93-abc12, orphaning the original pod foobar-deadbeef93-xyz76 and potentially causing unwanted behavior. This issue arises when pods create resources like ConfigMaps or CustomResources and these resources are not correctly reassigned to the new pod, leading to orphaned objects. Ensuring that pods can pass their ownerReferences to new objects they manage would prevent this issue, as the ownership hierarchy would be preserved even when pods are recreated.

For example, a CustomResource created by a pod managed by a DaemonSet may be unexpectedly garbage collected if the pod is deleted. This can disrupt system behavior, as the necessary resource is lost. By allowing the pod to inherit and pass down ownerReferences, the CustomResource would remain correctly managed by the DaemonSet, avoiding such disruptions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a cause-effect implied here that I don't understand. I think you mean to say that now you have enough information to set the owner of the CR to the DS itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rephrased this line, does it convey the intention now?

### Risks and Mitigations

- Security Risk: Some environments may not want to expose the ownerReferences of a pod to the pod itself.
- Mitigation: This feature could be added as a feature gate, disabled by default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a mitigation to this KEP. If there is some cluster wherein the admin DOES NOT want pods to be able to know their owners, the mitigation would be to install an admission check (webhook or Validating Admission Policy) to reject pods that try to use this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have edited the Mitigation entry, thanks @thockin

Logs from the pod will show the ownerReferences of the pod:

```bash
$ kubectl logs kubernetes-downwardapi-volume-example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is under-specified.

I don't know if we should dump the WHOLE object, but it's probably fine - can you think of any reason to hide any of these?

YAML is a terrible file format for parsing - I think it would be better to spec this as something like:

"""
The file representation of this fieldref is a JSON list of objects, each of which is a serialized k8s.io/apimachinery/pkg/apis/meta/v1.OwnerReference.
"""

Open question: Should we instead produce a metav1.List? e.g.

{
    "kind": "OwnerReference",
    "apiVersion": "meta/v1",
    "items": [
        { .... }, { ... }
    ]
}

IF there was ever a meta/v2, we would need to allow people to request that - does the output need to be self-documenting? I think probably not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we should dump the WHOLE object, but it's probably fine - can you think of any reason to hide any of these?

I think if we go for your JSON recommendation and only pass a list is more than ok, all we need is the ownerRef to pass it to objects that the Pod will create.

So to your open question, I like the metav1.List idea.


If `ownerReferences` is empty or not set, the file will be empty.

Is out of the scope of this KEP to add the `ownerReferences` to the pod's environment variables.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It's always a smell when we do one but not the other. This value could be specced the same as the file - a JSON list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with JSON is possible, agree.

Comment on lines +375 to +419
- Feature gate name: `DownwardAPIOwnerReferences`
- Components depending on the feature gate: `kubelet`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want all new changes to be introduced with a feature gate?

If you think your change could break a cluster, then YES.

If you your change has any aspect of API, then YES.

new optimizations to the watch cache

Could absolutely break a cluster!

adding new fields to the status of a resource

Any API field we add goes through a gate so that we can safely do a rollback without breaking or dropping information.

adding new entry to resource annotation

If this produces a value which version x-1 of Kubernetes would reject, then it needs a gate. Always think about this case:

User upgrades cluster from x->y
User uses your new API field.
Uh oh! Something bad happened.
User rolls cluster back to x.

Your API field is still stored in etcd, and may have some lingering external effect - what will happen? What if the user tries to update the object - will it fail validation? Or will the new field just disappear?


###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

No, the missing Downward API field will be perceived as a missing Volume preventing the pod from starting.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, wouldn't it be seen as an unknown field ref?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right. fixed

Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]>
@ArangoGutierrez
Copy link
Contributor Author

@thockin / @mrunalp Please take another view. I think we are close on this one

@kannon92
Copy link
Contributor

kannon92 commented Oct 8, 2024

@ArangoGutierrez

We tried to reach you to figure out if you were going to work on this for 1.32 (#4753 (comment)).

We did not hear from you so we pulled this KEP from consideration for 1.32. We did not allocate a sig-node approver for this so it may be difficult to get sig-node approvers in the next two days.

/cc @dchen1107 @SergeyKanzhelev @mrunalp @klueska

@ArangoGutierrez
Copy link
Contributor Author

@ArangoGutierrez

We tried to reach you to figure out if you were going to work on this for 1.32 (#4753 (comment)).

We did not hear from you so we pulled this KEP from consideration for 1.32. We did not allocate a sig-node approver for this so it may be difficult to get sig-node approvers in the next two days.

/cc @dchen1107 @SergeyKanzhelev @mrunalp @klueska

Understood, thanks for the heads up.

@thockin
Copy link
Member

thockin commented Oct 10, 2024

Let's make a point to get together and discuss this in real-time EARLY in the next cycle

@kannon92
Copy link
Contributor

kannon92 commented Oct 10, 2024

Let's make a point to get together and discuss this in real-time EARLY in the next cycle

@thockin there is no reason why we can't review now and get it ready for the next cycle..


For example, a CustomResource created by a pod managed by a DaemonSet may be unexpectedly garbage collected if the pod is deleted. This can disrupt system behavior, as the CustomResource is garbageCollected. By allowing the pod to inherit and pass down ownerReferences, the CustomResource would remain correctly managed by the DaemonSet(the owner), avoiding disruptions.

### Goals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we only need one level of ownerReference? Would it ever be needed to know the owner of the owner of the owner? I cannot tell from the motivation section whether one level will always be enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I don't have a strong use case for that. If you have a use case maybe we can discuss it and integrate it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one option is to just understand how it can be extended to the owner of an owner in future. To make sure we are not making it unnatural to be extended this way in future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants