-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NPEP: Iron out Cluster Egress Support API Design #144
NPEP: Iron out Cluster Egress Support API Design #144
Conversation
Skipping CI for Draft Pull Request. |
/label api-review |
a0b7967
to
65de3f1
Compare
✅ Deploy Preview for kubernetes-sigs-network-policy-api ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
65de3f1
to
d6a0c93
Compare
// Support: Core | ||
// | ||
// +optional | ||
ExternalNetworks *metav1.LabelSelector `json:"externalNetworks,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed the discussion of this in the WG meeting, but I suspect there is not actually a strong use case for having a label selector here rather than just referring to a specific ExternalNetworkSet by name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct no strong use cases afaik, its only an extra additional thing if like in my example below they want to group their networks into different networksets. I am ok to directly use metadata name here if that is desired.
// to select a set of external networks | ||
// +optional | ||
// +kubebuilder:validation:MaxItems=100 | ||
Networks []string `json:"networks,omitempty" validate:"omitempty,dive,cidr"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that "validate" section a real thing? (Is "omitempty" supposed to be duplicated there? Is "dive" a typo for something?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question, I shamelessly copied what calico does for validating the CIDR ranges: https://github.com/projectcalico/calico/blob/cbdf9cfb2b212948b03418b6b20e1f64327a9d07/api/pkg/apis/projectcalico/v3/globalnetworkset.go#L51 because it means we don't need to do the complicated CEL stuff: https://github.com/openshift/api/pull/1410/files#diff-2e98c189fdf696589c6d28124557b9cf96a329e82f790b4e61209351cc68648dR552 but perhaps that is not the right approach and we want to do the fancy CEL thing for v4 and v6 which means combining them into the same string might be a bit hard..
// ExternalNetworkSetSpec defines the desired state of ExternalNetworkSet. | ||
type ExternalNetworkSetSpec struct { | ||
// Networks is the list of NetworkCIDR (both v4 & v6) that can be used to define | ||
// external destinations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere you need to talk about what happens if someone puts internal CIDRs into an ExternalNetworkSet. (Keeping in mind that in some cases the ANP implementation will not be able to reliably distinguish internal from external CIDRs, so the answer can't just be "they will be ignored".)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah good point, I think the answer will be it will take effect :D so be careful what you ask for ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or errr we should go with "implementation-defined?" :D
} | ||
``` | ||
|
||
An `externalNetworkSet` is a new object used to define a set of networks outside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to list pros and cons for having a separate object, here is what I can think of:
pros:
- sharing the same ExternalNetworkSets between ANPs, that maybe useful when defining ANPs for different subjects
cons:
- complicated management: additional watcher, integrity violation handlers, cleanup of unreferenced objects, etc
- may worsen readability: to see what a given ANP does you need to query another object
Please add more :)
from the FQDN side, we will probably need another object that would only be used for egress (assuming CIDRs may be used for ingress rules too, but without FQDNs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the FQDN side, we will probably need another object that would only be used for egress (assuming CIDRs may be used for ingress rules too, but without FQDNs)
hmm so that is the thing I am looking for feedback from @rahulkjoshi about. I was thinking an "externalDestination" can be anything, the object ExternalNetworkSet
currently has a networks
field which is a list of CIDRs, however if we want to accommodate fqdn's
as well into this object (like domains
field or whatever) and then put a at a time only a single peer type can be allowed, I am ok to do that and rename this object accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that concerns me about this is that I know there's some discussion about "Network" as a Kubernetes concept (with Pods have multiple NICs and multiple Networks). I worry that have an object called "ExternalNetwork" would be too confusing.
That's exacerbated by making EgressPeer.ExternalNetwork
be a label-selector. Then the API doesn't automatically indicate what gets selected.
+1 to the cons Nadia suggested. I'm not sure they outweigh the re-use situation, but happy to discuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the cons outweigh the re-use situation but I also don't think we want to do "External" anymore so it might just be cidrs
- still discussing on the details.
That's exacerbated by making EgressPeer.ExternalNetwork be a label-selector. Then the API doesn't automatically indicate what gets selected.
yes there was a suggestion to using metadata name as means to select.
Having said that there is really no strong push for doing a new object neither was there specific pushback, so all in all we can keep it simple for now and fold it into the main API. If we want more creative and expressive ways before we move to v1 or beta we can reevaluate this.
d6a0c93
to
4c554b8
Compare
4c554b8
to
fb2bfb4
Compare
to: | ||
- externalNetworks: | ||
matchLabels: | ||
network: internet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From above schema, I see ExternalNetworks
as array of strings. How can it match labels here? Are these predefined labels need to be created by admin and group IPs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tamilmani1989 : Were you there at KubeCon NA Chicago and were you the one asking about having a new object?
So this part of the NPEP is still not entirely fixed. See #144 (comment) for details.
Example: <TBD> -> after we reach consensus on externalNetwork versus network
-> anything below this is right now under progress. As of now we are thinking of simply having a string of CIDRs in the main API instead of new object and then using selectors to match on them because no one reached out with use cases for needing a new object. Do you have strong use case in which case we can pivot towards this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope it's not me at kubecon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tamilmani1989 : ack, IIUC so as far as your use case is concerned what you are looking for is a way to explicitly deny a set of cidrs without having to implicitly deny everything. I think that use case will be covered regardless in ANP for sure so that admins can define firewall type allow/deny rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tssurya yes that's correct. Thanks for confirming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is very different from the previous one (or is there a need to share dynamic CIDRs with namespace owners too?).
Yeah, it's a different one. There's no need to share or delegate anything to a namespace owner.
who creates the CIRDs resource and who updates it
We can assume that cluster admin must create the CIDR group object at the same time as the ANP object referencing it. It could either be pre-populated with entries or left empty for external controller to populate
how is referencing/naming organized?
This could a CRD object reference (object name, assuming CIDR object cluster-wide ) or, alternatively, a label selector to be able to match multiple CIDR objects
This sounds a lot like FQDN rules...
Yep, very similar to FQDN rules, with the exception that the IP discovery is done not via DNS but via labels/tags in the cloud or e.g. BGP communities for on-prem/baremetal environments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namespace admins cannot apply labels to their own namespaces.
In this case I mean pod labels, it is required to provide the same granularity as NetworkPolicy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the two cases as orthogonal but complimentary.
Sure, they are both about delegating CIDRs management, but in the first case it's namespace owners delegating management to cluster admin, and in the second case it's cluster admin delegating management to a controller, that is managed by cluster admin (same persona). If want to join these use cases with the same API, we should discuss is it ok if all namespaces will be able to see/use all CIRD groups (e.g. use empty selector for CIRD groups)? It means if cluster admin wants to use CIDR group in ANP (like in BGP example), there is no way to prevent namespaces from using it (could it expose some sensitive information?).
When talking about controller-managed and admin-managed CIDR groups, let's see how CIDR group spec could look like (considering "We can assume that cluster admin must create the CIDR group object at the same time as the ANP object referencing it.")
e.g. for only admin-managed CIDR groups we could just have spec.CIDRs []string
, but if we want controller to manage e.g. DNS names, it should be some spec.FQDN string
+ status.CIDRs []string
. Then for some other controller populating BGP params it could be spec.Community string
+ status.CIDRs []string
. So how can we define one spec here?
There are more options around organizing something similar, these questions are based on the mentioned answer from @networkop. Very curious to learn how your BGP example works.
Reflecting on how this is similar to FQDN selector we are working on, should we consider instead of peer.fqdn: <a.b.com>
using peer.cirdGroup: matchLabels:{cidrGroup.kubernetes.io/fqdn: <a.b.com>}
, or the opposite for other controller-managed cidrGroups: use peer.<CIDR group semantics>
without cidrGroup reference, and only leave cidrGroups for manual management? Or maybe something in between? (Rapidly changing ips for FQDN is an interesting limitation for using "controller-managed CIDR group")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the late reply, I must have missed the notification. I can clarify the BGP use case a bit.
The use case I have in mind is the following:
As a cluster admin or a namespace owner, I want to be able to create network policies that match a dynamic set of external IPs (e.g. set of VMs or set of directly reachable Pods in another cluster). I may not be able to use FQDN rules for that due to TTL being too long or simply lack of DNS service discovery in an external system.
As a cluster admin, I would create CIDR group resource and a BGP controller that would manage it. The mapping between BGP communities and CIDR group resource names is a BGP controller configuration (e.g. annotation on the CIDR group resource). The speed of IP churn is bounded by the BGP advertisement interval and can be further reduced by the BGP controller implementation.
Apologies, I'm not an expert in k8s APIs but I've tried to mock up how a BGP-managed CIDR group object would look like:
apiVersion: policy.networking.k8s.io/v1alpha1
kind: CIDRGroup
metadata:
name: cluster-wide-cidr-cloud-1
labels:
origin: cloud-1
annotations:
"bgp.cidrmanager.k8s.io/is-managed": "true"
"bgp.cidrmanager.k8s.io/32bit-community": "2147483647"
spec:
cidrs:
- 192.0.2.0/24
- 203.0.113.0/24
- 198.51.100.0/24
status:
conditions:
- lastTransitionTime: "2022-12-29T14:53:50Z"
status: "True"
type: Reconciled
---
apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
name: cluster-wide-allow-example
spec:
priority: 30
subject:
namespaces: {}
ingress:
- action: Allow
from:
- cidrGroups:
matchLabels:
origin: cloud-1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@networkop : thank you so much for the details. We will cover the relevant use cases as part of #183
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please make sure to move this NPEP to Implementable here -> https://github.com/kubernetes-sigs/network-policy-api/blob/main/mkdocs.yml#L60
/hold After discussion in Sig-Network today I think we should have @thockin, @aojea and @kal from the core sig-network side put eyes on this as well before agreeing on the API design, basically they should have a look once the NPEP enters the "implementable" state which I will add to our docs as well. |
fb2bfb4
to
a12493e
Compare
a12493e
to
a922d02
Compare
@danwinship @astoycos PTAL. This NPEP's changes are ready for merge. We have started a different NPEP for the new CDIR object use cases so we could just merge this one first asap. |
a922d02
to
8c18711
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry that we designed a "Cluster Egress" API (as the title states) and then at the last minute said "ok fine it's cluster-internal too" without fully considering the ramifications of that...
Signed-off-by: Surya Seetharaman <[email protected]>
8c18711
to
5182ef2
Compare
thanks @danwinship ! I think I have addressed all your comments, PTAL. |
I have changed the terminologies to match what's reflected from previous discussions. I think title itself is ok because we do now will start allowing cluster egress support just that its not "exclusively ONLY cluster egress support" has an additional internal support too. OR more like its just a CIDR peer block :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I agree with @danwinship somewhat but I think Surya has done a good job of being very explicit in the API where we do this (i.e the all important 0.0.0.0/32 case) I'm alright with this moving forward.
I am happy with how this looks. |
/hold cancel |
/assign @danwinship |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall
action: "Allow" | ||
to: | ||
- networks: | ||
- POD_CIDR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this is a place holder for actually CIDRs rather than some env vars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes just a placeholder for the the actual podCIDR, thanks @Dyanngg for reviewing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side this still looks good, I will also review the actual implementation PR to make sure the verbiage we encode there makes sense
/lgtm Congrats @tssurya!! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: astoycos, Dyanngg, tssurya The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
See POC: #143
A few things were discussed in the sig-network-policy-api meeting that happened on 10th October 2023:
AdminNetworkPolicyPeerIngress
andAdminNetworkPolicyPeerEgress
types -> this will be commit1 of POC PR: Implement Cluster Egress Traffic semantics (ANP&BANP NorthBound Support) - PART1 - Nodes #143.Namespace
andPods
peer types do NOT account for host-networked pods ? CONSENSUS: Yes, makes sense; let's do that. Opened Callout namespaces/pods peers do not include host-net pods #156ExternalNetworks
CRD like mentioned in the NPEP? Both have advantages and disadvantages and during the meeting no-one had strong preferences and nobody opposed either OR which is good. But we wanted to check back with @rahulkjoshi on what would make sense for FQDN egress peers. So for now going ahead with whatever is present in the NPEP and then we can narrow in on that.When a pod is isolated for ingress, the only allowed connections into the pod are those from the pod's node
)? For ANP we don't want that, we want admins to be able to explicitly specify that relation. This can be the ability to expressisSameNode
relation for a pod dynamically in addition to giving outNodeSelectors
but this is an ingress traffic user story. CONSENSUS: Since we will split the peers as ingress/egress as per (1), this API design detail can be covered as part of ingress NPEP. As of now we don't have user stories that ask of this for egress traffic (I want pods to be able to talk to only my node).