-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP 1880: graduation to GA #4983
Conversation
/assign @thockin @danwinship |
/assign @soltysh for PRR |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left several comments to address.
@@ -603,6 +586,8 @@ Files: | |||
- test/integration/servicecidr/allocator_test.go | |||
- test/integration/servicecidr/migration_test.go | |||
- test/integration/servicecidr/servicecidr_test.go | |||
- test/integration/servicecidr/feature_enable_disable_test.go | |||
- test/integration/servicecidr/perf_test.go | |||
|
|||
##### e2e tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there more e2e-s planned? From checking the codebase I see only one currently. For starters I'm missing e2e covering GA API endpoints (see my earlier comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the GA criteria I'm missing all 3 elements pointed out in this update:
- 2 examples of real-world usage
- More rigorous forms of testing—e.g., downgrade tests and scalability tests
- Allowing time for feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 examples of real-world usage
I didn't know how to do this, since this is an opt-in feature is not possible to get telemetry, I know there are GKE customers using it and also that Kops is able to do it, see the description of this PR to find the examples I found of people using or testing it.
More rigorous forms of testing—e.g., downgrade tests and scalability tests
This is a core feature, means that once enable it inherit all the scalability testing, upgrade/downgrade is added as integration
Allowing time for feedback
It went beta in 1.31 and we usually leave one release for feedback, I got good internal feedback from GKE users .... I also want to avoid permanent betas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I update this information on the doc or is enough with the description on the issue? I linked several places that I think should be proof the feature is being used in production ... we generally add the diversity of implementations to avoid favoring vendors, but in this case this feature is a core functionality that will be used everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For posterity, discussed this with Antonio on slack, I suggested to link the public references (Kops from what he writes above, plus mention GKE customers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
the PRR section
@@ -603,6 +586,8 @@ Files: | |||
- test/integration/servicecidr/allocator_test.go | |||
- test/integration/servicecidr/migration_test.go | |||
- test/integration/servicecidr/servicecidr_test.go | |||
- test/integration/servicecidr/feature_enable_disable_test.go | |||
- test/integration/servicecidr/perf_test.go | |||
|
|||
##### e2e tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For posterity, discussed this with Antonio on slack, I suggested to link the public references (Kops from what he writes above, plus mention GKE customers).
- Allowing time for feedback | ||
- The feature was beta in 1.31, it has been tested by different projects and enabled in one platform [with only one bug reported](https://github.com/kubernetes/kubernetes/issues/127588). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great - thank you!
| 1.34 | GA (there are no bitmaps running) | GA on (also delete old bitmap)| | ||
| 1.35 | remove feature gate | remove feature gate | | ||
| 1.34 | GA (there are no bitmaps running) | GA (also delete old bitmap)| | ||
| 1.36 | remove feature gate | GA | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: just for visibility I'd probably add the row for 1.35, some folks might not catch that it's being skipped 😉
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, danwinship, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
a04f416
to
9fc5833
Compare
/hold |
/hold cancel After some offline discussion, this needs a "needs action" release note warning infrastructure providers that they should install an admission hook to disable this feature if they don't want to allow it, or if their cluster contains other components that need to know all of the active service CIDRs but which haven't been updated to know about the |
Could VAP be used instead? It's simpler to install than a webhook. |
I would work on document this properly and to put the proper guardrails, I will also consult with apimachinery if they prefer to use an admission controller |
VAP to block any ServiceCIDR that is not the default apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: "servicecidrs.default"
spec:
failurePolicy: Fail
matchConstraints:
resourceRules:
- apiGroups: ["networking.k8s.io"]
apiVersions: ["v1","v1beta1"]
operations: ["CREATE", "UPDATE"]
resources: ["servicecidrs"]
validations:
- expression: "object.metadata.name == 'kubernetes'"
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
name: "servicecidrs-binding"
spec:
policyName: "servicecidrs.default"
validationActions: [Deny,Audit] Tested with kind kind-config
2.Apply the policies
it is denied
TODO: Allow to reference parameters, so admins can define the range of IPs available |
@@ -20,18 +20,18 @@ see-also: | |||
replaces: | |||
|
|||
# The target maturity stage in the current dev cycle for this KEP. | |||
stage: beta | |||
stage: stable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget the details. This was beta, but off by default. So it didn't get a ton of usage. Is going straight to GA safe? Will we set the lock-to-default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we can avoid setting the lock to default, that will allow cluster admins to disable the feature gate entirely without needing to use a webhook or VAP #4983 (comment) , cc: @danwinship
Ah more complex example of VAP that allows to define allowed ranges of CIDRs, to control the ranges users can create and avoid footguns by creating overlapping IP ranges apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: "servicecidrs.default"
spec:
failurePolicy: Fail
matchConstraints:
resourceRules:
- apiGroups: ["networking.k8s.io"]
apiVersions: ["v1","v1beta1"]
operations: ["CREATE", "UPDATE"]
resources: ["servicecidrs"]
matchConditions:
- name: 'exclude-default-servicecidr'
expression: "object.metadata.name != 'kubernetes'"
variables:
- name: allowed
expression: "['10.96.0.0/16','2001:db8::/64']"
validations:
- expression: "object.spec.cidrs.all(i , variables.allowed.exists(j , cidr(j).containsCIDR(i)))"
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
name: "servicecidrs-binding"
spec:
policyName: "servicecidrs.default"
validationActions: [Deny,Audit] Test: apiVersion: networking.k8s.io/v1beta1
kind: ServiceCIDR
metadata:
name: newcidr1
spec:
cidrs:
- 10.96.0.0/24 It is within range so it is allowed
apiVersion: networking.k8s.io/v1beta1
kind: ServiceCIDR
metadata:
name: newcidr2
spec:
cidrs:
- 10.96.0.0/24
- fd00:1::/64 has one cidr out of the allowed list so is denied
Changing the range to an allowed one
now it is allowed
Thanks @JoelSpeed for this fantastic library to handle IPs and CIDRs with CEL |
One-line PR description: graduate to ga
Issue link: Multiple Service CIDRs #1880
Other comments:
There was only one bug opened during this time kubernetes/kubernetes#127588 that was caused by a copy and paste error.
It is available in GKE https://cloud.google.com/kubernetes-engine/docs/how-to/use-beta-apis and used in production clusters.
It can be used by OSS users with installers that allow to set the feature gates and enable the beta apis, see kops kubernetes/test-infra#33864
and blog about how to use to solve overlapping problems https://akarat.xyz/Changing-kubernetes-CIDR-live-on-production/
It is been tested by the community spidernet-io/spiderpool#4089 (comment)
There is also an external blog about it https://engineering.doit.com/scaling-kubernetes-how-to-seamlessly-expand-service-ip-ranges-246f392112f8
The feature has been 2 releases in beta, v1.31 and v1.32, it is being used in production and there is proof of usage and testing beyond kubernetes project, it should be enough signal to move it to GA and avoid having a permanent beta API