Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPEP: Iron out Cluster Egress Support API Design #144

Merged

Conversation

tssurya
Copy link
Contributor

@tssurya tssurya commented Sep 24, 2023

See POC: #143

A few things were discussed in the sig-network-policy-api meeting that happened on 10th October 2023:

  1. Should we diverge on our ingress and egress peer types - since we are adding only egress support initially and FQDN is also an egress peer thing? As per apiserver team recommendations that seems to be the best way forward? CONSENSUS: Yes let's have AdminNetworkPolicyPeerIngress and AdminNetworkPolicyPeerEgress types -> this will be commit1 of POC PR: Implement Cluster Egress Traffic semantics (ANP&BANP NorthBound Support) - PART1 - Nodes #143.
  2. Should we openly callout that Namespace and Pods peer types do NOT account for host-networked pods ? CONSENSUS: Yes, makes sense; let's do that. Opened Callout namespaces/pods peers do not include host-net pods #156
  3. Should we have CIDRs and FQDNs inside the main API for ANP and BANP OR should we go with ExternalNetworks CRD like mentioned in the NPEP? Both have advantages and disadvantages and during the meeting no-one had strong preferences and nobody opposed either OR which is good. But we wanted to check back with @rahulkjoshi on what would make sense for FQDN egress peers. So for now going ahead with whatever is present in the NPEP and then we can narrow in on that.
  4. What about node2pod healthchecks that NetworkPolicy API allows implicitly: https://kubernetes.io/docs/concepts/services-networking/network-policies/ (When a pod is isolated for ingress, the only allowed connections into the pod are those from the pod's node)? For ANP we don't want that, we want admins to be able to explicitly specify that relation. This can be the ability to express isSameNode relation for a pod dynamically in addition to giving out NodeSelectors but this is an ingress traffic user story. CONSENSUS: Since we will split the peers as ingress/egress as per (1), this API design detail can be covered as part of ingress NPEP. As of now we don't have user stories that ask of this for egress traffic (I want pods to be able to talk to only my node).
  5. What about "internalDestinations?" -> NetPol defined "ipBlock" as a type of peer which can have podCIDRs/nodeCIDRs whatever in it, in ANP we are trying to explicitly define a peer as "outside the cluster" sort of like egressIPs where intra cluster traffic is not considered as egress. However some implementations cannot distinguish between internal and external... so how strong should this definition of "northbound destinations" be?

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 24, 2023
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 24, 2023
@tssurya
Copy link
Contributor Author

tssurya commented Sep 24, 2023

/label api-review

@k8s-ci-robot k8s-ci-robot added the api-review Categorizes an issue or PR as actively needing an API review. label Sep 24, 2023
@tssurya tssurya force-pushed the npep-126-implementable-stage branch from a0b7967 to 65de3f1 Compare October 7, 2023 13:12
@netlify
Copy link

netlify bot commented Oct 7, 2023

Deploy Preview for kubernetes-sigs-network-policy-api ready!

Name Link
🔨 Latest commit 5182ef2
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-network-policy-api/deploys/65ae3fb78bf3560008befa79
😎 Deploy Preview https://deploy-preview-144--kubernetes-sigs-network-policy-api.netlify.app/npeps/npep-126-egress-traffic-control
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 7, 2023
@tssurya tssurya marked this pull request as ready for review October 7, 2023 13:13
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 7, 2023
npep/npep-126-egress-traffic-control.md Outdated Show resolved Hide resolved
npep/npep-126-egress-traffic-control.md Outdated Show resolved Hide resolved
@tssurya tssurya force-pushed the npep-126-implementable-stage branch from 65de3f1 to d6a0c93 Compare October 16, 2023 10:53
npep/npep-126-egress-traffic-control.md Outdated Show resolved Hide resolved
npep/npep-126-egress-traffic-control.md Outdated Show resolved Hide resolved
npep/npep-126-egress-traffic-control.md Outdated Show resolved Hide resolved
// Support: Core
//
// +optional
ExternalNetworks *metav1.LabelSelector `json:"externalNetworks,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the discussion of this in the WG meeting, but I suspect there is not actually a strong use case for having a label selector here rather than just referring to a specific ExternalNetworkSet by name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct no strong use cases afaik, its only an extra additional thing if like in my example below they want to group their networks into different networksets. I am ok to directly use metadata name here if that is desired.

npep/npep-126-egress-traffic-control.md Outdated Show resolved Hide resolved
npep/npep-126-egress-traffic-control.md Outdated Show resolved Hide resolved
npep/npep-126-egress-traffic-control.md Outdated Show resolved Hide resolved
// to select a set of external networks
// +optional
// +kubebuilder:validation:MaxItems=100
Networks []string `json:"networks,omitempty" validate:"omitempty,dive,cidr"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that "validate" section a real thing? (Is "omitempty" supposed to be duplicated there? Is "dive" a typo for something?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, I shamelessly copied what calico does for validating the CIDR ranges: https://github.com/projectcalico/calico/blob/cbdf9cfb2b212948b03418b6b20e1f64327a9d07/api/pkg/apis/projectcalico/v3/globalnetworkset.go#L51 because it means we don't need to do the complicated CEL stuff: https://github.com/openshift/api/pull/1410/files#diff-2e98c189fdf696589c6d28124557b9cf96a329e82f790b4e61209351cc68648dR552 but perhaps that is not the right approach and we want to do the fancy CEL thing for v4 and v6 which means combining them into the same string might be a bit hard..

// ExternalNetworkSetSpec defines the desired state of ExternalNetworkSet.
type ExternalNetworkSetSpec struct {
// Networks is the list of NetworkCIDR (both v4 & v6) that can be used to define
// external destinations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere you need to talk about what happens if someone puts internal CIDRs into an ExternalNetworkSet. (Keeping in mind that in some cases the ANP implementation will not be able to reliably distinguish internal from external CIDRs, so the answer can't just be "they will be ignored".)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good point, I think the answer will be it will take effect :D so be careful what you ask for ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or errr we should go with "implementation-defined?" :D

npep/npep-126-egress-traffic-control.md Outdated Show resolved Hide resolved
}
```

An `externalNetworkSet` is a new object used to define a set of networks outside
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to list pros and cons for having a separate object, here is what I can think of:
pros:

  • sharing the same ExternalNetworkSets between ANPs, that maybe useful when defining ANPs for different subjects

cons:

  • complicated management: additional watcher, integrity violation handlers, cleanup of unreferenced objects, etc
  • may worsen readability: to see what a given ANP does you need to query another object

Please add more :)

from the FQDN side, we will probably need another object that would only be used for egress (assuming CIDRs may be used for ingress rules too, but without FQDNs)

Copy link
Contributor Author

@tssurya tssurya Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the FQDN side, we will probably need another object that would only be used for egress (assuming CIDRs may be used for ingress rules too, but without FQDNs)

hmm so that is the thing I am looking for feedback from @rahulkjoshi about. I was thinking an "externalDestination" can be anything, the object ExternalNetworkSet currently has a networks field which is a list of CIDRs, however if we want to accommodate fqdn's as well into this object (like domains field or whatever) and then put a at a time only a single peer type can be allowed, I am ok to do that and rename this object accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that concerns me about this is that I know there's some discussion about "Network" as a Kubernetes concept (with Pods have multiple NICs and multiple Networks). I worry that have an object called "ExternalNetwork" would be too confusing.

That's exacerbated by making EgressPeer.ExternalNetwork be a label-selector. Then the API doesn't automatically indicate what gets selected.

+1 to the cons Nadia suggested. I'm not sure they outweigh the re-use situation, but happy to discuss.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the cons outweigh the re-use situation but I also don't think we want to do "External" anymore so it might just be cidrs - still discussing on the details.

That's exacerbated by making EgressPeer.ExternalNetwork be a label-selector. Then the API doesn't automatically indicate what gets selected.

yes there was a suggestion to using metadata name as means to select.

Having said that there is really no strong push for doing a new object neither was there specific pushback, so all in all we can keep it simple for now and fold it into the main API. If we want more creative and expressive ways before we move to v1 or beta we can reevaluate this.

@tssurya tssurya force-pushed the npep-126-implementable-stage branch from d6a0c93 to 4c554b8 Compare October 17, 2023 11:42
@tssurya tssurya changed the title NPEP: Iron out Egress Support API Design NPEP: Iron out Cluster Egress Support API Design Oct 19, 2023
@tssurya tssurya force-pushed the npep-126-implementable-stage branch from 4c554b8 to fb2bfb4 Compare October 26, 2023 14:58
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 30, 2023
to:
- externalNetworks:
matchLabels:
network: internet

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From above schema, I see ExternalNetworks as array of strings. How can it match labels here? Are these predefined labels need to be created by admin and group IPs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamilmani1989 : Were you there at KubeCon NA Chicago and were you the one asking about having a new object?
So this part of the NPEP is still not entirely fixed. See #144 (comment) for details.

Example: <TBD> -> after we reach consensus on externalNetwork versus network -> anything below this is right now under progress. As of now we are thinking of simply having a string of CIDRs in the main API instead of new object and then using selectors to match on them because no one reached out with use cases for needing a new object. Do you have strong use case in which case we can pivot towards this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope it's not me at kubecon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamilmani1989 : ack, IIUC so as far as your use case is concerned what you are looking for is a way to explicitly deny a set of cidrs without having to implicitly deny everything. I think that use case will be covered regardless in ANP for sure so that admins can define firewall type allow/deny rules.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tssurya yes that's correct. Thanks for confirming.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is very different from the previous one (or is there a need to share dynamic CIDRs with namespace owners too?).

Yeah, it's a different one. There's no need to share or delegate anything to a namespace owner.

who creates the CIRDs resource and who updates it

We can assume that cluster admin must create the CIDR group object at the same time as the ANP object referencing it. It could either be pre-populated with entries or left empty for external controller to populate

how is referencing/naming organized?

This could a CRD object reference (object name, assuming CIDR object cluster-wide ) or, alternatively, a label selector to be able to match multiple CIDR objects

This sounds a lot like FQDN rules...

Yep, very similar to FQDN rules, with the exception that the IP discovery is done not via DNS but via labels/tags in the cloud or e.g. BGP communities for on-prem/baremetal environments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Namespace admins cannot apply labels to their own namespaces.

In this case I mean pod labels, it is required to provide the same granularity as NetworkPolicy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the two cases as orthogonal but complimentary.

Sure, they are both about delegating CIDRs management, but in the first case it's namespace owners delegating management to cluster admin, and in the second case it's cluster admin delegating management to a controller, that is managed by cluster admin (same persona). If want to join these use cases with the same API, we should discuss is it ok if all namespaces will be able to see/use all CIRD groups (e.g. use empty selector for CIRD groups)? It means if cluster admin wants to use CIDR group in ANP (like in BGP example), there is no way to prevent namespaces from using it (could it expose some sensitive information?).

When talking about controller-managed and admin-managed CIDR groups, let's see how CIDR group spec could look like (considering "We can assume that cluster admin must create the CIDR group object at the same time as the ANP object referencing it.")
e.g. for only admin-managed CIDR groups we could just have spec.CIDRs []string, but if we want controller to manage e.g. DNS names, it should be some spec.FQDN string + status.CIDRs []string. Then for some other controller populating BGP params it could be spec.Community string + status.CIDRs []string. So how can we define one spec here?
There are more options around organizing something similar, these questions are based on the mentioned answer from @networkop. Very curious to learn how your BGP example works.

Reflecting on how this is similar to FQDN selector we are working on, should we consider instead of peer.fqdn: <a.b.com> using peer.cirdGroup: matchLabels:{cidrGroup.kubernetes.io/fqdn: <a.b.com>}, or the opposite for other controller-managed cidrGroups: use peer.<CIDR group semantics> without cidrGroup reference, and only leave cidrGroups for manual management? Or maybe something in between? (Rapidly changing ips for FQDN is an interesting limitation for using "controller-managed CIDR group")

Copy link

@networkop networkop Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late reply, I must have missed the notification. I can clarify the BGP use case a bit.
The use case I have in mind is the following:
As a cluster admin or a namespace owner, I want to be able to create network policies that match a dynamic set of external IPs (e.g. set of VMs or set of directly reachable Pods in another cluster). I may not be able to use FQDN rules for that due to TTL being too long or simply lack of DNS service discovery in an external system.

As a cluster admin, I would create CIDR group resource and a BGP controller that would manage it. The mapping between BGP communities and CIDR group resource names is a BGP controller configuration (e.g. annotation on the CIDR group resource). The speed of IP churn is bounded by the BGP advertisement interval and can be further reduced by the BGP controller implementation.

Apologies, I'm not an expert in k8s APIs but I've tried to mock up how a BGP-managed CIDR group object would look like:

apiVersion: policy.networking.k8s.io/v1alpha1
kind: CIDRGroup
metadata:
  name: cluster-wide-cidr-cloud-1
  labels:
    origin: cloud-1
  annotations:
    "bgp.cidrmanager.k8s.io/is-managed": "true"
    "bgp.cidrmanager.k8s.io/32bit-community": "2147483647"
spec:
  cidrs:
  - 192.0.2.0/24
  - 203.0.113.0/24
  - 198.51.100.0/24
status:
  conditions:
  - lastTransitionTime: "2022-12-29T14:53:50Z"
    status: "True"
    type: Reconciled

---
apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
  name: cluster-wide-allow-example
spec:
  priority: 30
  subject:
    namespaces: {}
  ingress:
    - action: Allow
      from:
      - cidrGroups:
            matchLabels:
              origin: cloud-1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@networkop : thank you so much for the details. We will cover the relevant use cases as part of #183

Copy link
Member

@astoycos astoycos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please make sure to move this NPEP to Implementable here -> https://github.com/kubernetes-sigs/network-policy-api/blob/main/mkdocs.yml#L60

@astoycos
Copy link
Member

astoycos commented Dec 7, 2023

/hold

After discussion in Sig-Network today I think we should have @thockin, @aojea and @kal from the core sig-network side put eyes on this as well before agreeing on the API design, basically they should have a look once the NPEP enters the "implementable" state which I will add to our docs as well.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 7, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 30, 2023
@tssurya
Copy link
Contributor Author

tssurya commented Dec 31, 2023

@danwinship @astoycos PTAL. This NPEP's changes are ready for merge.
Nodes PR is already merged: #143
CIDR PR is waiting for this one to merge: #185

We have started a different NPEP for the new CDIR object use cases so we could just merge this one first asap.

Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that we designed a "Cluster Egress" API (as the title states) and then at the last minute said "ok fine it's cluster-internal too" without fully considering the ramifications of that...

npeps/npep-126-egress-traffic-control.md Outdated Show resolved Hide resolved
npeps/npep-126-egress-traffic-control.md Outdated Show resolved Hide resolved
npeps/npep-126-egress-traffic-control.md Show resolved Hide resolved
@tssurya tssurya force-pushed the npep-126-implementable-stage branch from 8c18711 to 5182ef2 Compare January 22, 2024 10:13
@tssurya
Copy link
Contributor Author

tssurya commented Jan 22, 2024

thanks @danwinship ! I think I have addressed all your comments, PTAL.

@tssurya
Copy link
Contributor Author

tssurya commented Jan 22, 2024

I worry that we designed a "Cluster Egress" API (as the title states) and then at the last minute said "ok fine it's cluster-internal too" without fully considering the ramifications of that...

I have changed the terminologies to match what's reflected from previous discussions. I think title itself is ok because we do now will start allowing cluster egress support just that its not "exclusively ONLY cluster egress support" has an additional internal support too. OR more like its just a CIDR peer block :)

Copy link
Member

@astoycos astoycos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I agree with @danwinship somewhat but I think Surya has done a good job of being very explicit in the API where we do this (i.e the all important 0.0.0.0/32 case) I'm alright with this moving forward.

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 23, 2024
@tssurya
Copy link
Contributor Author

tssurya commented Jan 26, 2024

I am happy with how this looks.
Removing hold. Let's merge this?

@tssurya
Copy link
Contributor Author

tssurya commented Jan 26, 2024

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 26, 2024
@tssurya
Copy link
Contributor Author

tssurya commented Jan 26, 2024

/assign @danwinship

Copy link
Contributor

@Dyanngg Dyanngg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall

action: "Allow"
to:
- networks:
- POD_CIDR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is a place holder for actually CIDRs rather than some env vars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes just a placeholder for the the actual podCIDR, thanks @Dyanngg for reviewing.

Copy link
Member

@astoycos astoycos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side this still looks good, I will also review the actual implementation PR to make sure the verbiage we encode there makes sense

@astoycos
Copy link
Member

/lgtm
/approve

Congrats @tssurya!!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 31, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: astoycos, Dyanngg, tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 8368454 into kubernetes-sigs:main Jan 31, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.