Create NPEP-127 for external ingress endpoints #128

npinaeva · 2023-07-20T11:24:25Z

An attempt to collect use cases for external ingress endpoints.
Reviews and more examples are welcome.

Tracking issue #127

netlify · 2023-07-20T11:24:31Z

✅ Deploy Preview for kubernetes-sigs-network-policy-api ready!

Name	Link
🔨 Latest commit	`4e8beea`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-network-policy-api/deploys/652026ff29284800086db35b
😎 Deploy Preview	https://deploy-preview-128--kubernetes-sigs-network-policy-api.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

astoycos

I think we still have some thinking to do on this one. It seems like we're going after a pretty broad set of use cases, some of which may not be as important.

The default-deny user story resonates with me, however for the others (concerning pods as the "selected entities" not Nodes) I generally feel like having a pod exposed to the "outside world" is a rare case. i.e even with all the mentioned ingress APIs there's usually a cluster -entity which routes/load balances or directs external traffic do destination pods, which we can already block traffic from today.

Lastly we need to think carefully about having a NPEP which references multi-cluster scenarios...as I think the scope could quickly explode.

npep/npep-127.md

astoycos · 2023-08-15T13:23:38Z

npep/npep-127.md

+ - zero trust policy: As a cluster administrator I want to set up default deny all ingress policy as a BANP.
+


Suggested change

- zero trust policy: As a cluster administrator I want to set up default deny all ingress policy as a BANP.

- zero trust policy: As a cluster administrator I want to set up default deny all ingress guardrails with a BANP.

I am trying to distinguish ANP vs BANP policies with the word "default", because BANP is just the default overridable policy, and I am not sure is guardrails word makes this semantics any more obvious, wdyt?

yeah, I don't think we actually ever used the word "guardrails" in a consistent way

astoycos · 2023-08-15T13:28:33Z

npep/npep-127.md

+get to the cluster workloads. This should apply not only to the connections coming from the other cluster workloads, but to
+all incoming connections.
+
+ - block well-known ports: As a cluster administrator I want to block all ingress traffic on specific ports.


Will/should this apply to well-known ports on Pods AND nodes?

If you are talking about subjects, then only pods, since (B)ANP only applies to non-host network pods (what we call cluster workload)?
from the peer side, this should apply to any ingress traffic, from pods, nodes, external endpoints, and anything else that can reach pods

astoycos · 2023-08-15T13:40:28Z

npep/npep-127.md

+compromised, it will not be able to affect other worker nodes. To do so, I want to create a cluster-wide policy to
+deny ingress connections from some of the cluster nodes. I may also need to explicitly allow access from the control plane nodes.
+
+ - inter-cluster communication: As a cluster manager for multiple clusters I want to make sure ingress connections


Hrm so on this one I think it's not really important to isolate this user story to "ingress connections are only allowed from a pre-defined set of related clusters". More specifically, if we're planning on identifying foreign clusters by networkCIDR there's no way to distinguish between a "foreign cluster" and just a plain old "external client" right?

yeah right, I think inter-cluster part should go to the example

astoycos · 2023-08-15T13:42:34Z

npep/npep-127.md

+have limited access to the management cluster workloads by selecting management cluster workloads I want to protect
+and a set of allowed subnets for the corresponding hosted clusters.
+
+- external services: As a cluster manager I want to make sure ingress connections from a pre-defined set of external endpoints


This is broad "pre-defined set of external endpoints" what's defined as an external endpoint? can it be ANY IP?

Also this is the first time I see "can't be denied by the namespace owners" can you add a note somewhere that all the other user stories are explicitly non-overridable by namespace owners?

not sure what is the best place to clarify this, but I refer to external endpoints in this NPEP as anything outside the cluster. So I would say any IP, unless you see any problem with that?

astoycos · 2023-08-15T13:49:51Z

npep/npep-127.md

+
+* Implement ingress traffic control from external destinations (outside the cluster)
+
+## Non-Goals


Determining the non-goals here is going to be really important... I think we need to narrow this NPEP down

that would be nice, but I just don't have any ideas on what to put here for now. I hoped that something will be outlined in the discussion here, if you have any ideas for non-goals, please lmk :)

k8s-ci-robot · 2023-08-16T11:22:30Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: npinaeva
Once this PR has been reviewed and has the lgtm label, please assign astoycos for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

npinaeva

thanks a lot for a review @astoycos !

npinaeva · 2023-08-16T09:52:32Z

npep/npep-127.md

+get to the cluster workloads. This should apply not only to the connections coming from the other cluster workloads, but to
+all incoming connections.
+
+ - block well-known ports: As a cluster administrator I want to block all ingress traffic on specific ports.


If you are talking about subjects, then only pods, since (B)ANP only applies to non-host network pods (what we call cluster workload)?
from the peer side, this should apply to any ingress traffic, from pods, nodes, external endpoints, and anything else that can reach pods

npinaeva · 2023-08-16T11:04:05Z

npep/npep-127.md

+compromised, it will not be able to affect other worker nodes. To do so, I want to create a cluster-wide policy to
+deny ingress connections from some of the cluster nodes. I may also need to explicitly allow access from the control plane nodes.
+
+ - inter-cluster communication: As a cluster manager for multiple clusters I want to make sure ingress connections


yeah right, I think inter-cluster part should go to the example

npinaeva · 2023-08-16T11:21:41Z

npep/npep-127.md

+have limited access to the management cluster workloads by selecting management cluster workloads I want to protect
+and a set of allowed subnets for the corresponding hosted clusters.
+
+- external services: As a cluster manager I want to make sure ingress connections from a pre-defined set of external endpoints


not sure what is the best place to clarify this, but I refer to external endpoints in this NPEP as anything outside the cluster. So I would say any IP, unless you see any problem with that?

npep/npep-127.md

danwinship · 2023-09-06T17:47:51Z

npep/npep-127.md

+ - compromised node protection: As a cluster administrator I want to make sure a compromised node can't send traffic to the 
+cluster workloads scheduled on the other nodes.
+
+    Example: my cluster has a set of nodes with very sensitive workloads, I want to make sure that if a worker node A is
+compromised, it will not be able to affect other worker nodes. To do so, I want to create a cluster-wide policy to
+deny ingress connections from some of the cluster nodes. I may also need to explicitly allow access from the control plane nodes.


Hm... so are there pods on the compromised node? Can't the attacker forge connections from pod IPs rather than the node IP? Or is it assumed that that will be blocked as well?

this user story considers that all pods from the same namespace are on the same node, and that inter-namespace communication is also denied. I omitted it since we already have the means to implement this part with namespace selectors (and potentially tenancy is you label namespace with nodes), but maybe it makes sense to add these details

danwinship · 2023-09-06T17:53:19Z

npep/npep-127.md

+it is applied to the cluster workloads, and should provide the required level of security regardless of how/if a cluster workload
+is exposed to the outer world.
+
+### Use cases


So all of these are pretty vague about what kind of cluster ingress they're talking about, but there are a lot of different kinds, and some are harder to deal with than others, so I think it's important that our user stories explain what sorts of ingress activity it is that the admin cares about. (Even if it turns out that we declare some of these ingress types to be Non-Goals, it's good to be able to see clearly what sort of user stories we decided we can't solve right now.)

eg:

direct delivery to hostNetwork pod

pod HostPort

direct delivery to pod-network pod IP (for network plugins that support that)

service NodePort (with Cluster or Local externalTrafficPolicy)

service ExternalIP (with Cluster or Local externalTrafficPolicy)

service LoadBalancer IP (with Cluster or Local externalTrafficPolicy)

direct delivery to service ClusterIP (for network plugins that support that)

Ingress/Gateway (which generally resolves to one of the above, but exactly which one is implementation-dependent).

I considered the way traffic reaches the pod to be not important in this context (meaning it should work in all situations), but now I think it may be useful to differentiate, e.g. in case some plugins can't implement CIDR-based filtering for some options. Or how do you think we should use this list in this context?

IMHO let's just start really really simple. Just keep simple specific use cases that are not implementation specific, for me I want to be able to create policies for ingress (southbound traffic) for when pods are accessed using external traffic policy = local load balancer by an external client. Here the srcIP is that of the clientIP and I want to say whether this connection should be served by my pod
BUT
as soon as I re-read what I wrote above ^ :P that defies my definition of "service" because ofc that pod is supposed to serve all clients right? :) because its exposed as a service to the outside world?
AGAIN
I might wanna only allow a subset of clients to talk to this service so adding policies in a way to protect my backend pods..

direct delivery to hostNetwork pod
pod HostPort

^ these seem very rare on cloud env's as use cases, can't think of any BUT on BMs can be useful but then again mostly the bastion or jump host or whatever you put on the same subnet and put accessible routes to is an entity you are consciously choosing to allow.. why would we want to block that...

Even if it turns out that we declare some of these ingress types to be Non-Goals, it's good to be able to see clearly what sort of user stories we decided we can't solve right now.)

right ^ good idea so that when we get asked why the current cidr and node peers are only egress.. we can just look at the non-goals section of this NPEP.

I considered the way traffic reaches the pod to be not important in this context (meaning it should work in all situations)

I mean, ideally, yes, certainly. But in practice that is impossible, because the network policy implementation is normally completely separate from cloud LBs and Gateway mechanisms, and has no power over them or even any insight into how they work. Cluster ingress, as currently defined, is a very leaky abstraction.

We talked about this a little at KubeCon Paris and Shane mentioned https://github.com/kubernetes-sigs/gateway-api/blob/main/geps/gep-735/index.md which is a (Declined) GEP about doing policy-ish stuff in Gateway Routes. It is likely that for some user stories, providing a non-ANP policy API inside Gateway API is likely to work better than trying to implement ANP policy against arbitrary Gateway mechanisms. There was talk about doing a "GEPNPEP"...

(This is somewhat related to the problem of pod-to-cloud-LB-to-whatever traffic discussed in #203, and one thing that could potentially help to solve both would be to come up with a "bigger" model of "Kubernetes Networking" that makes the cloud network and gateways more legible to the network policy implementation. In theory, this could be part of KNI, though it is not something currently being discussed.)

Signed-off-by: Nadia Pinaeva <[email protected]>

k8s-triage-robot · 2024-01-22T03:21:51Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-02-21T03:42:34Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

astoycos · 2024-02-21T14:01:47Z

/remove-lifecycle rotten
/remove-lifecycle stale

k8s-triage-robot · 2024-07-03T14:23:34Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-08-02T14:27:38Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-09-01T15:26:28Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2024-09-01T15:26:33Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

npinaeva · 2024-09-16T20:49:33Z

/reopen

k8s-ci-robot · 2024-09-16T20:49:38Z

@npinaeva: Failed to re-open PR: state cannot be changed. The npep-ingress branch was force-pushed or recreated.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 20, 2023

k8s-ci-robot requested review from danwinship and Dyanngg July 20, 2023 11:24

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 20, 2023

astoycos reviewed Aug 15, 2023

View reviewed changes

npinaeva force-pushed the npep-ingress branch from e427efc to 587980a Compare August 16, 2023 11:22

npinaeva commented Aug 16, 2023

View reviewed changes

danwinship reviewed Sep 6, 2023

View reviewed changes

Add enhancement proposal for external ingress endpoints.

4e8beea

Signed-off-by: Nadia Pinaeva <[email protected]>

npinaeva force-pushed the npep-ingress branch from 587980a to 4e8beea Compare October 6, 2023 15:25

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 21, 2024

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 21, 2024

npinaeva mentioned this pull request Mar 25, 2024

NetworkPolicy: NodeSelector for Ingress/Egress rules kubernetes/kubernetes#51891

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 3, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 2, 2024

k8s-ci-robot closed this Sep 1, 2024

npinaeva mentioned this pull request Sep 16, 2024

Create NPEP-127 for external ingress endpoints #249

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create NPEP-127 for external ingress endpoints #128

Create NPEP-127 for external ingress endpoints #128

npinaeva commented Jul 20, 2023

netlify bot commented Jul 20, 2023 •

edited

Loading

astoycos left a comment

astoycos Aug 15, 2023

npinaeva Aug 16, 2023

danwinship Sep 6, 2023

astoycos Aug 15, 2023

npinaeva Aug 16, 2023

astoycos Aug 15, 2023

npinaeva Aug 16, 2023

astoycos Aug 15, 2023

npinaeva Aug 16, 2023

astoycos Aug 15, 2023

npinaeva Aug 16, 2023

k8s-ci-robot commented Aug 16, 2023

npinaeva left a comment

npinaeva Aug 16, 2023

npinaeva Aug 16, 2023

npinaeva Aug 16, 2023

danwinship Sep 6, 2023

npinaeva Oct 6, 2023

danwinship Sep 6, 2023

npinaeva Sep 20, 2023

tssurya Mar 8, 2024

tssurya Mar 8, 2024

danwinship Apr 4, 2024 •

edited

Loading

k8s-triage-robot commented Jan 22, 2024

k8s-triage-robot commented Feb 21, 2024

astoycos commented Feb 21, 2024

k8s-triage-robot commented Jul 3, 2024

k8s-triage-robot commented Aug 2, 2024

k8s-triage-robot commented Sep 1, 2024

k8s-ci-robot commented Sep 1, 2024

npinaeva commented Sep 16, 2024

k8s-ci-robot commented Sep 16, 2024

		- zero trust policy: As a cluster administrator I want to set up default deny all ingress policy as a BANP.


		* Implement ingress traffic control from external destinations (outside the cluster)

		## Non-Goals

Create NPEP-127 for external ingress endpoints #128

Create NPEP-127 for external ingress endpoints #128

Conversation

npinaeva commented Jul 20, 2023

netlify bot commented Jul 20, 2023 • edited Loading

✅ Deploy Preview for kubernetes-sigs-network-policy-api ready!

astoycos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Aug 16, 2023

npinaeva left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danwinship Apr 4, 2024 • edited Loading

Choose a reason for hiding this comment

k8s-triage-robot commented Jan 22, 2024

k8s-triage-robot commented Feb 21, 2024

astoycos commented Feb 21, 2024

k8s-triage-robot commented Jul 3, 2024

k8s-triage-robot commented Aug 2, 2024

k8s-triage-robot commented Sep 1, 2024

k8s-ci-robot commented Sep 1, 2024

npinaeva commented Sep 16, 2024

k8s-ci-robot commented Sep 16, 2024

netlify bot commented Jul 20, 2023 •

edited

Loading

danwinship Apr 4, 2024 •

edited

Loading