NPEP: Iron out Cluster Egress Support API Design #144

tssurya · 2023-09-24T15:24:46Z

See POC: #143

A few things were discussed in the sig-network-policy-api meeting that happened on 10th October 2023:

Should we diverge on our ingress and egress peer types - since we are adding only egress support initially and FQDN is also an egress peer thing? As per apiserver team recommendations that seems to be the best way forward? CONSENSUS: Yes let's have AdminNetworkPolicyPeerIngress and AdminNetworkPolicyPeerEgress types -> this will be commit1 of POC PR: Implement Cluster Egress Traffic semantics (ANP&BANP NorthBound Support) - PART1 - Nodes #143.
Should we openly callout that Namespace and Pods peer types do NOT account for host-networked pods ? CONSENSUS: Yes, makes sense; let's do that. Opened Callout namespaces/pods peers do not include host-net pods #156
Should we have CIDRs and FQDNs inside the main API for ANP and BANP OR should we go with ExternalNetworks CRD like mentioned in the NPEP? Both have advantages and disadvantages and during the meeting no-one had strong preferences and nobody opposed either OR which is good. But we wanted to check back with @rahulkjoshi on what would make sense for FQDN egress peers. So for now going ahead with whatever is present in the NPEP and then we can narrow in on that.
What about node2pod healthchecks that NetworkPolicy API allows implicitly: https://kubernetes.io/docs/concepts/services-networking/network-policies/ (When a pod is isolated for ingress, the only allowed connections into the pod are those from the pod's node)? For ANP we don't want that, we want admins to be able to explicitly specify that relation. This can be the ability to express isSameNode relation for a pod dynamically in addition to giving out NodeSelectors but this is an ingress traffic user story. CONSENSUS: Since we will split the peers as ingress/egress as per (1), this API design detail can be covered as part of ingress NPEP. As of now we don't have user stories that ask of this for egress traffic (I want pods to be able to talk to only my node).
What about "internalDestinations?" -> NetPol defined "ipBlock" as a type of peer which can have podCIDRs/nodeCIDRs whatever in it, in ANP we are trying to explicitly define a peer as "outside the cluster" sort of like egressIPs where intra cluster traffic is not considered as egress. However some implementations cannot distinguish between internal and external... so how strong should this definition of "northbound destinations" be?

k8s-ci-robot · 2023-09-24T15:24:47Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

tssurya · 2023-09-24T15:26:28Z

/label api-review

netlify · 2023-10-07T13:12:35Z

✅ Deploy Preview for kubernetes-sigs-network-policy-api ready!

Name	Link
🔨 Latest commit	`5182ef2`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-network-policy-api/deploys/65ae3fb78bf3560008befa79
😎 Deploy Preview	https://deploy-preview-144--kubernetes-sigs-network-policy-api.netlify.app/npeps/npep-126-egress-traffic-control
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

npep/npep-126-egress-traffic-control.md

danwinship · 2023-10-16T14:03:06Z

npep/npep-126-egress-traffic-control.md

+	// Support: Core
+	//
+	// +optional
+	ExternalNetworks *metav1.LabelSelector `json:"externalNetworks,omitempty"`


I missed the discussion of this in the WG meeting, but I suspect there is not actually a strong use case for having a label selector here rather than just referring to a specific ExternalNetworkSet by name.

correct no strong use cases afaik, its only an extra additional thing if like in my example below they want to group their networks into different networksets. I am ok to directly use metadata name here if that is desired.

npep/npep-126-egress-traffic-control.md

danwinship · 2023-10-16T14:09:02Z

npep/npep-126-egress-traffic-control.md

+	// to select a set of external networks
+	// +optional
+	// +kubebuilder:validation:MaxItems=100
+	Networks []string `json:"networks,omitempty" validate:"omitempty,dive,cidr"`


Is that "validate" section a real thing? (Is "omitempty" supposed to be duplicated there? Is "dive" a typo for something?)

good question, I shamelessly copied what calico does for validating the CIDR ranges: https://github.com/projectcalico/calico/blob/cbdf9cfb2b212948b03418b6b20e1f64327a9d07/api/pkg/apis/projectcalico/v3/globalnetworkset.go#L51 because it means we don't need to do the complicated CEL stuff: https://github.com/openshift/api/pull/1410/files#diff-2e98c189fdf696589c6d28124557b9cf96a329e82f790b4e61209351cc68648dR552 but perhaps that is not the right approach and we want to do the fancy CEL thing for v4 and v6 which means combining them into the same string might be a bit hard..

danwinship · 2023-10-16T14:12:42Z

npep/npep-126-egress-traffic-control.md

+// ExternalNetworkSetSpec defines the desired state of ExternalNetworkSet.
+type ExternalNetworkSetSpec struct {
+	// Networks is the list of NetworkCIDR (both v4 & v6) that can be used to define
+	// external destinations.


Somewhere you need to talk about what happens if someone puts internal CIDRs into an ExternalNetworkSet. (Keeping in mind that in some cases the ANP implementation will not be able to reliably distinguish internal from external CIDRs, so the answer can't just be "they will be ignored".)

yeah good point, I think the answer will be it will take effect :D so be careful what you ask for ?

or errr we should go with "implementation-defined?" :D

npep/npep-126-egress-traffic-control.md

npinaeva · 2023-10-16T14:35:48Z

npep/npep-126-egress-traffic-control.md

+}
+```
+
+An `externalNetworkSet` is a new object used to define a set of networks outside


Let's try to list pros and cons for having a separate object, here is what I can think of:
pros:

sharing the same ExternalNetworkSets between ANPs, that maybe useful when defining ANPs for different subjects

cons:

complicated management: additional watcher, integrity violation handlers, cleanup of unreferenced objects, etc

may worsen readability: to see what a given ANP does you need to query another object

Please add more :)

from the FQDN side, we will probably need another object that would only be used for egress (assuming CIDRs may be used for ingress rules too, but without FQDNs)

from the FQDN side, we will probably need another object that would only be used for egress (assuming CIDRs may be used for ingress rules too, but without FQDNs)

hmm so that is the thing I am looking for feedback from @rahulkjoshi about. I was thinking an "externalDestination" can be anything, the object ExternalNetworkSet currently has a networks field which is a list of CIDRs, however if we want to accommodate fqdn's as well into this object (like domains field or whatever) and then put a at a time only a single peer type can be allowed, I am ok to do that and rename this object accordingly.

One thing that concerns me about this is that I know there's some discussion about "Network" as a Kubernetes concept (with Pods have multiple NICs and multiple Networks). I worry that have an object called "ExternalNetwork" would be too confusing.

That's exacerbated by making EgressPeer.ExternalNetwork be a label-selector. Then the API doesn't automatically indicate what gets selected.

+1 to the cons Nadia suggested. I'm not sure they outweigh the re-use situation, but happy to discuss.

I don't think the cons outweigh the re-use situation but I also don't think we want to do "External" anymore so it might just be cidrs - still discussing on the details.

That's exacerbated by making EgressPeer.ExternalNetwork be a label-selector. Then the API doesn't automatically indicate what gets selected.

yes there was a suggestion to using metadata name as means to select.

Having said that there is really no strong push for doing a new object neither was there specific pushback, so all in all we can keep it simple for now and fold it into the main API. If we want more creative and expressive ways before we move to v1 or beta we can reevaluate this.

tamilmani1989 · 2023-11-13T18:57:57Z

npep/npep-126-egress-traffic-control.md

+    to:
+    - externalNetworks:
+        matchLabels:
+          network: internet


From above schema, I see ExternalNetworks as array of strings. How can it match labels here? Are these predefined labels need to be created by admin and group IPs?

@tamilmani1989 : Were you there at KubeCon NA Chicago and were you the one asking about having a new object?
So this part of the NPEP is still not entirely fixed. See #144 (comment) for details.

Example: <TBD> -> after we reach consensus on externalNetwork versus network -> anything below this is right now under progress. As of now we are thinking of simply having a string of CIDRs in the main API instead of new object and then using selectors to match on them because no one reached out with use cases for needing a new object. Do you have strong use case in which case we can pivot towards this.

nope it's not me at kubecon.

@tamilmani1989 : ack, IIUC so as far as your use case is concerned what you are looking for is a way to explicitly deny a set of cidrs without having to implicitly deny everything. I think that use case will be covered regardless in ANP for sure so that admins can define firewall type allow/deny rules.

@tssurya yes that's correct. Thanks for confirming.

it is very different from the previous one (or is there a need to share dynamic CIDRs with namespace owners too?).

Yeah, it's a different one. There's no need to share or delegate anything to a namespace owner.

who creates the CIRDs resource and who updates it

We can assume that cluster admin must create the CIDR group object at the same time as the ANP object referencing it. It could either be pre-populated with entries or left empty for external controller to populate

how is referencing/naming organized?

This could a CRD object reference (object name, assuming CIDR object cluster-wide ) or, alternatively, a label selector to be able to match multiple CIDR objects

This sounds a lot like FQDN rules...

Yep, very similar to FQDN rules, with the exception that the IP discovery is done not via DNS but via labels/tags in the cloud or e.g. BGP communities for on-prem/baremetal environments

Namespace admins cannot apply labels to their own namespaces.

In this case I mean pod labels, it is required to provide the same granularity as NetworkPolicy.

I see the two cases as orthogonal but complimentary.

Sure, they are both about delegating CIDRs management, but in the first case it's namespace owners delegating management to cluster admin, and in the second case it's cluster admin delegating management to a controller, that is managed by cluster admin (same persona). If want to join these use cases with the same API, we should discuss is it ok if all namespaces will be able to see/use all CIRD groups (e.g. use empty selector for CIRD groups)? It means if cluster admin wants to use CIDR group in ANP (like in BGP example), there is no way to prevent namespaces from using it (could it expose some sensitive information?).

When talking about controller-managed and admin-managed CIDR groups, let's see how CIDR group spec could look like (considering "We can assume that cluster admin must create the CIDR group object at the same time as the ANP object referencing it.")
e.g. for only admin-managed CIDR groups we could just have spec.CIDRs []string, but if we want controller to manage e.g. DNS names, it should be some spec.FQDN string + status.CIDRs []string. Then for some other controller populating BGP params it could be spec.Community string + status.CIDRs []string. So how can we define one spec here?
There are more options around organizing something similar, these questions are based on the mentioned answer from @networkop. Very curious to learn how your BGP example works.

Reflecting on how this is similar to FQDN selector we are working on, should we consider instead of peer.fqdn: <a.b.com> using peer.cirdGroup: matchLabels:{cidrGroup.kubernetes.io/fqdn: <a.b.com>}, or the opposite for other controller-managed cidrGroups: use peer.<CIDR group semantics> without cidrGroup reference, and only leave cidrGroups for manual management? Or maybe something in between? (Rapidly changing ips for FQDN is an interesting limitation for using "controller-managed CIDR group")

sorry for the late reply, I must have missed the notification. I can clarify the BGP use case a bit.
The use case I have in mind is the following:
As a cluster admin or a namespace owner, I want to be able to create network policies that match a dynamic set of external IPs (e.g. set of VMs or set of directly reachable Pods in another cluster). I may not be able to use FQDN rules for that due to TTL being too long or simply lack of DNS service discovery in an external system.

As a cluster admin, I would create CIDR group resource and a BGP controller that would manage it. The mapping between BGP communities and CIDR group resource names is a BGP controller configuration (e.g. annotation on the CIDR group resource). The speed of IP churn is bounded by the BGP advertisement interval and can be further reduced by the BGP controller implementation.

Apologies, I'm not an expert in k8s APIs but I've tried to mock up how a BGP-managed CIDR group object would look like:

apiVersion: policy.networking.k8s.io/v1alpha1 kind: CIDRGroup metadata: name: cluster-wide-cidr-cloud-1 labels: origin: cloud-1 annotations: "bgp.cidrmanager.k8s.io/is-managed": "true" "bgp.cidrmanager.k8s.io/32bit-community": "2147483647" spec: cidrs: - 192.0.2.0/24 - 203.0.113.0/24 - 198.51.100.0/24 status: conditions: - lastTransitionTime: "2022-12-29T14:53:50Z" status: "True" type: Reconciled --- apiVersion: policy.networking.k8s.io/v1alpha1 kind: AdminNetworkPolicy metadata: name: cluster-wide-allow-example spec: priority: 30 subject: namespaces: {} ingress: - action: Allow from: - cidrGroups: matchLabels: origin: cloud-1

@networkop : thank you so much for the details. We will cover the relevant use cases as part of #183

astoycos

Also please make sure to move this NPEP to Implementable here -> https://github.com/kubernetes-sigs/network-policy-api/blob/main/mkdocs.yml#L60

astoycos · 2023-12-07T17:57:47Z

/hold

After discussion in Sig-Network today I think we should have @thockin, @aojea and @kal from the core sig-network side put eyes on this as well before agreeing on the API design, basically they should have a look once the NPEP enters the "implementable" state which I will add to our docs as well.

tssurya · 2023-12-31T10:06:55Z

@danwinship @astoycos PTAL. This NPEP's changes are ready for merge.
Nodes PR is already merged: #143
CIDR PR is waiting for this one to merge: #185

We have started a different NPEP for the new CDIR object use cases so we could just merge this one first asap.

danwinship

I worry that we designed a "Cluster Egress" API (as the title states) and then at the last minute said "ok fine it's cluster-internal too" without fully considering the ramifications of that...

npeps/npep-126-egress-traffic-control.md

Signed-off-by: Surya Seetharaman <[email protected]>

tssurya · 2024-01-22T10:13:58Z

thanks @danwinship ! I think I have addressed all your comments, PTAL.

tssurya · 2024-01-22T10:18:27Z

I worry that we designed a "Cluster Egress" API (as the title states) and then at the last minute said "ok fine it's cluster-internal too" without fully considering the ramifications of that...

I have changed the terminologies to match what's reflected from previous discussions. I think title itself is ok because we do now will start allowing cluster egress support just that its not "exclusively ONLY cluster egress support" has an additional internal support too. OR more like its just a CIDR peer block :)

astoycos

LGTM

I agree with @danwinship somewhat but I think Surya has done a good job of being very explicit in the API where we do this (i.e the all important 0.0.0.0/32 case) I'm alright with this moving forward.

tssurya · 2024-01-26T16:38:19Z

I am happy with how this looks.
Removing hold. Let's merge this?

tssurya · 2024-01-26T16:38:42Z

/hold cancel

tssurya · 2024-01-26T16:39:03Z

/assign @danwinship

Dyanngg

LGTM overall

Dyanngg · 2024-01-29T19:24:32Z

npeps/npep-126-egress-traffic-control.md

+    action: "Allow"
+    to:
+    - networks:
+      - POD_CIDR


I'm assuming this is a place holder for actually CIDRs rather than some env vars?

yes just a placeholder for the the actual podCIDR, thanks @Dyanngg for reviewing.

astoycos

From my side this still looks good, I will also review the actual implementation PR to make sure the verbiage we encode there makes sense

astoycos · 2024-01-31T16:52:42Z

/lgtm
/approve

Congrats @tssurya!!

k8s-ci-robot · 2024-01-31T16:52:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: astoycos, Dyanngg, tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Dyanngg,astoycos]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 24, 2023

k8s-ci-robot requested review from astoycos and danwinship September 24, 2023 15:24

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 24, 2023

k8s-ci-robot added the api-review Categorizes an issue or PR as actively needing an API review. label Sep 24, 2023

tssurya mentioned this pull request Sep 24, 2023

[ENHANCEMENT] Add Northbound traffic support for ANP API #126

Closed

tssurya force-pushed the npep-126-implementable-stage branch from a0b7967 to 65de3f1 Compare October 7, 2023 13:12

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 7, 2023

tssurya marked this pull request as ready for review October 7, 2023 13:13

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 7, 2023

k8s-ci-robot requested a review from Dyanngg October 7, 2023 13:13

tssurya mentioned this pull request Oct 15, 2023

Implement Cluster Egress Traffic semantics (ANP&BANP NorthBound Support) - PART1 - Nodes #143

Merged

tssurya commented Oct 15, 2023

View reviewed changes

npep/npep-126-egress-traffic-control.md Outdated Show resolved Hide resolved

npep/npep-126-egress-traffic-control.md Outdated Show resolved Hide resolved

tssurya force-pushed the npep-126-implementable-stage branch from 65de3f1 to d6a0c93 Compare October 16, 2023 10:53

danwinship reviewed Oct 16, 2023

View reviewed changes

npinaeva reviewed Oct 16, 2023

View reviewed changes

tssurya force-pushed the npep-126-implementable-stage branch from d6a0c93 to 4c554b8 Compare October 17, 2023 11:42

tssurya changed the title ~~NPEP: Iron out Egress Support API Design~~ NPEP: Iron out Cluster Egress Support API Design Oct 19, 2023

tssurya force-pushed the npep-126-implementable-stage branch from 4c554b8 to fb2bfb4 Compare October 26, 2023 14:58

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 30, 2023

tamilmani1989 reviewed Nov 13, 2023

View reviewed changes

astoycos requested changes Nov 28, 2023

View reviewed changes

tssurya mentioned this pull request Dec 5, 2023

[ENHANCEMENT] Use Cases for wanting Egress CIDR Peers as an external object #182

Open

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 7, 2023

astoycos assigned thockin, khenidak and aojea Dec 7, 2023

tssurya force-pushed the npep-126-implementable-stage branch from fb2bfb4 to a12493e Compare December 30, 2023 22:36

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 30, 2023

tssurya force-pushed the npep-126-implementable-stage branch from a12493e to a922d02 Compare December 31, 2023 10:02

tssurya force-pushed the npep-126-implementable-stage branch from a922d02 to 8c18711 Compare December 31, 2023 10:10

tssurya mentioned this pull request Jan 18, 2024

Implement inline CIDR block egress peer #185

Merged

danwinship reviewed Jan 18, 2024

View reviewed changes

npeps/npep-126-egress-traffic-control.md Outdated Show resolved Hide resolved

npeps/npep-126-egress-traffic-control.md Outdated Show resolved Hide resolved

npeps/npep-126-egress-traffic-control.md Show resolved Hide resolved

NPEP: Iron out Egress Support API Design

5182ef2

Signed-off-by: Surya Seetharaman <[email protected]>

tssurya force-pushed the npep-126-implementable-stage branch from 8c18711 to 5182ef2 Compare January 22, 2024 10:13

astoycos approved these changes Jan 23, 2024

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 23, 2024

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 26, 2024

k8s-ci-robot assigned danwinship Jan 26, 2024

Dyanngg approved these changes Jan 29, 2024

View reviewed changes

astoycos approved these changes Jan 29, 2024

View reviewed changes

rahulkjoshi mentioned this pull request Jan 29, 2024

NPEP: FQDN Selector for Egress, User stories #134

Merged

k8s-ci-robot assigned astoycos Jan 31, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 31, 2024

k8s-ci-robot merged commit 8368454 into kubernetes-sigs:main Jan 31, 2024
8 checks passed

NPEP: Iron out Cluster Egress Support API Design #144

NPEP: Iron out Cluster Egress Support API Design #144

Conversation

tssurya commented Sep 24, 2023 • edited Loading

k8s-ci-robot commented Sep 24, 2023

tssurya commented Sep 24, 2023

netlify bot commented Oct 7, 2023 • edited Loading

✅ Deploy Preview for kubernetes-sigs-network-policy-api ready!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tssurya Oct 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

networkop Dec 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astoycos left a comment

Choose a reason for hiding this comment

astoycos commented Dec 7, 2023 • edited Loading

tssurya commented Dec 31, 2023

danwinship left a comment

Choose a reason for hiding this comment

tssurya commented Jan 22, 2024

tssurya commented Jan 22, 2024

astoycos left a comment

Choose a reason for hiding this comment

tssurya commented Jan 26, 2024

tssurya commented Jan 26, 2024

tssurya commented Jan 26, 2024

Dyanngg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astoycos left a comment • edited Loading

Choose a reason for hiding this comment

astoycos commented Jan 31, 2024

k8s-ci-robot commented Jan 31, 2024

tssurya commented Sep 24, 2023 •

edited

Loading

netlify bot commented Oct 7, 2023 •

edited

Loading

tssurya Oct 17, 2023 •

edited

Loading

networkop Dec 20, 2023 •

edited

Loading

astoycos commented Dec 7, 2023 •

edited

Loading

astoycos left a comment •

edited

Loading