diff --git a/keps/prod-readiness/sig-network/1880.yaml b/keps/prod-readiness/sig-network/1880.yaml index f3c3d43b879..c189800659d 100644 --- a/keps/prod-readiness/sig-network/1880.yaml +++ b/keps/prod-readiness/sig-network/1880.yaml @@ -2,4 +2,6 @@ kep-number: 1880 alpha: approver: "@johnbelamaric" beta: + approver: "@soltysh" +stable: approver: "@soltysh" \ No newline at end of file diff --git a/keps/sig-network/1880-multiple-service-cidrs/README.md b/keps/sig-network/1880-multiple-service-cidrs/README.md index 4c79d707c85..cd1c90acf4d 100644 --- a/keps/sig-network/1880-multiple-service-cidrs/README.md +++ b/keps/sig-network/1880-multiple-service-cidrs/README.md @@ -50,7 +50,6 @@ - [Alternative 2](#alternative-2) - [Alternative 3](#alternative-3) - [Alternative 4](#alternative-4) -- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) ## Release Signoff Checklist @@ -63,20 +62,20 @@ Items marked with (R) are required *prior to targeting to a milestone / release* - [x] (R) Design details are appropriately documented - [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) - - [ ] e2e Tests for all Beta API Operations (endpoints) - - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance + - [x] e2e Tests for all Beta API Operations (endpoints) + - [x] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) - - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free + - [x] (R) Minimum Two Week Window for GA e2e tests to prove flake free - [x] (R) Graduation criteria is in place - - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by + - [x] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) - [x] (R) Production readiness review completed - [x] (R) Production readiness review approved -- [ ] "Implementation History" section is up-to-date for milestone -- [ ] User-facing documentation has been created in [kubernetes/website], for publication to +- [x] "Implementation History" section is up-to-date for milestone +- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] -- [ ] Supporting documentation—e.g., additional design documents, links to mailing list +- [x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes - There will be added tests to verify: - API servers using the old and new allocators at same time @@ -603,6 +577,12 @@ Files: - test/integration/servicecidr/allocator_test.go - test/integration/servicecidr/migration_test.go - test/integration/servicecidr/servicecidr_test.go +- test/integration/servicecidr/feature_enable_disable_test.go +- test/integration/servicecidr/perf_test.go + +Links: + +https://storage.googleapis.com/k8s-triage/index.html?test=servicecidr%7Csynthetic_controlplane_test ##### e2e tests @@ -611,6 +591,23 @@ to use the new ServiceCIDR ranges allocated. https://testgrid.k8s.io/sig-network-kind#sig-network-kind,%20alpha,%20beta&include-filter-by-regex=ServiceCIDRs +https://testgrid.k8s.io/kops-misc#kops-aws-sig-network-beta + +In addition to the specific e2e test that validate the new functionality of allowing +to add new ServiceCIDRs and create new Services using the IPs of the new range, all +the existing e2e tests that exercises Services in one way or another are also exercising +the new APIs. + +If we take a job of an execution of any job with the feature enabled, per example, https://storage.googleapis.com/kubernetes-ci-logs/logs/ci-kubernetes-network-kind-alpha-beta-features/1866163383959556096/artifacts/kind-control-plane/pods/kube-system_kube-apiserver-kind-control-plane_89ea5ffb5eb9e46fc7a038252629d04c/kube-apiserver/0.log , we can see the ServiceCIDR and IPAddress objects are constantly exercised: + +```sh +$ grep -c networking.k8s.io/v1beta1/servicecidrs 0.log +553 + +$ grep -c networking.k8s.io/v1beta1/ipaddresses 0.log +400 +``` + ### Graduation Criteria #### Alpha @@ -635,9 +632,16 @@ https://testgrid.k8s.io/sig-network-kind#sig-network-kind,%20alpha,%20beta&inclu #### GA -- 2 examples of real-world usage +- examples of real-world usage: + - It is available in GKE https://cloud.google.com/kubernetes-engine/docs/how-to/use-beta-apis since 1.31 and used in production clusters (numbers can not be disclosed) + - [Non-GKE blog](https://engineering.doit.com/scaling-kubernetes-how-to-seamlessly-expand-service-ip-ranges-246f392112f8) about how to use the ServiceCIDR feature in GKE. + - It can be used by OSS users with installers that allow to set the feature gates and enable the beta apis, automated testing with kops, see kubernetes/test-infra#33864 and e2e tests section. + - It is being tested by the community [spidernet-io/spiderpool#4089 (comment)](https://github.com/spidernet-io/spiderpool/pull/4089#issuecomment-2505499043) since it went beta in 1.31. - More rigorous forms of testing—e.g., downgrade tests and scalability tests + - It is tested internally in GKE as part of the release. + - It will inherit all the project testing (scalability, e2e, ...) after being graduated. - Allowing time for feedback + - The feature was beta in 1.31, it has been tested by different projects and enabled in one platform [with only one bug reported](https://github.com/kubernetes/kubernetes/issues/127588). **Note:** Generally we also wait at least two releases between beta and GA/stable, because there's no opportunity for user feedback, or even bug reports, in back-to-back releases. @@ -697,8 +701,9 @@ it will be safe to disable the dual-write mode. | 1.31 | Beta off | Alpha off | | 1.32 | Beta on | Alpha off | | 1.33 | GA on | Beta off | -| 1.34 | GA (there are no bitmaps running) | GA on (also delete old bitmap)| -| 1.35 | remove feature gate | remove feature gate | +| 1.34 | GA (there are no bitmaps running) | GA (also delete old bitmap)| +| 1.36 | remove feature gate | GA | +| 1.37 | --- | remove feature gate | ## Production Readiness Review Questionnaire @@ -708,7 +713,9 @@ it will be safe to disable the dual-write mode. - [x] Feature gate (also fill in values in `kep.yaml`) - Feature gate name: MultiCIDRServiceAllocator - - Components depending on the feature gate: kube-apiserver, kube-controller-manager + - Components depending on the feature gate: kube-apiserver, kube-controller-manager + - Feature gate name: DisableAllocatorDualWrite + - Components depending on the feature gate: kube-apiserver, ###### Does enabling the feature change any default behavior? @@ -734,26 +741,10 @@ restoring the cluster to a working state. ###### Are there any tests for feature enablement/disablement? - - Tests for feature enablement/disablement are implemented as integration test, see `test/integration/servicecidr/feature_enable_disable.go` The feature add new API objects, no new API fields to existing objects are added. -### Rollout, Upgrade and Rollback Planning - +### Rollout, Upgrade and Rollback Planning ###### How can a rollout or rollback fail? Can it impact already running workloads? @@ -786,7 +777,6 @@ test/integration/servicecidr/allocator_test.go TestMigrateService 2. start an apiserver with the new feature enable 3. the reconciler must detect this stored service and create the corresponding IPAddress -To be added in Beta https://github.com/kubernetes/kubernetes/pull/122047 test/integration/servicecidr/feature_enable_disable.go TestEnableDisableServiceCIDR 1. start apiserver with the feature disabled 2. create new services and assert are correct @@ -959,6 +949,17 @@ This feature contain repair controllers in the apiserver that makes rollbacks sa - [x] Docs (`k/website`) update PR(s): - https://github.com/kubernetes/website/pull/37620 - https://github.com/kubernetes/website/pull/43469 +- [x] Beta + - [x] KEP (`k/enhancements`) update PR(s): + - https://github.com/kubernetes/enhancements/pull/4645 + - [x] Code (`k/k`) update PR(s): + - https://github.com/kubernetes/kubernetes/pull/122047 + - https://github.com/kubernetes/kubernetes/pull/125021 + - [x] Docs (`k/website`) update(s): + - https://github.com/kubernetes/website/pull/46947 +- [ ] Stable + - [ ] KEP (`k/enhancements`) update PR(s): https://github.com/kubernetes/enhancements/pull/4983 + - [ ] Code (`k/k`) update PR(s): https://github.com/kubernetes/kubernetes/pull/128971 ## Drawbacks @@ -977,24 +978,83 @@ Several alternatives were proposed in the original PR but discarded by different #### Alternative 1 +From Daniel Smith: + +> Each apiserver watches services and keeps an in-memory structure of free IPs. +> Instead of an allocation table, let's call it a "lock list". It's just a list of (IP, lock > expiration timestamp). When an apiserver wants to do something with an IP, it adds an item > to the list with a timestamp e.g. 30 seconds in the future (we can do this in a way that fails if the item is already there, in which case we abort). Then, we go use it. Then, we let the lock expire. (We can delete the lock early if using the IP fails.) +> (The above could be optimized in etcd by making an entry per-IP rather than a single list.) +> So, to make a safe allocation, apiserver comes up with a candidate IP address (either randomly or because the user requested it). Check it against the in-memory structure. If that passes, we look for a lock entry. If none is found, we add a lock entry. Then we use it in a service. Then we delete the lock entry (or wait for it to expire). + +> Nothing special needs to be done for deletion, I think it's fine if it takes a while for individual apiservers to mark an IP as safe for reuse. + +Problem: Strong dependency on etcd + #### Alternative 2 +From Tim Hockin: + +> make IP-allocations real +> resources. In the limit this would mean potentially 10s of thousands of +> tiny instances, but they don't change often. This would help in other +> areas where I want to allow APIs to specify IPs without worrying about +> hijacking important "real" IPs. I imagine it would work something like +> PVC<->PV binding. The problem here is that at least service IPs have +> always been synchronous to CREATE and changing that i a pretty significant +> behavioral difference thath users WILL be able to observe. I don't think +> we have any precedent for "nested" transactions on user-visible APIs? + +Problem: Incompatible change in behavior. + #### Alternative 3 +From Wojtek Tyczynski + +> keep storing a bitmap in-memory of kube-apiserver (have that propagated via watching Service objects) +> store just the hash from that bitmap in etcd +> whenever you want to allocate/release IP: +> (a) get the current hash from etcd +> (b) compute the hash from your in-memory representation +> (c) if those two are not equal, you're not synced - wait/retry/... +> (d) if they are equal, then if this is incorrect operation against in-memory operation return conflct/bad request/... +> (e) otherwise, apply in-memory and write to etcd having the current version is precondtion > (as in GuaranteedUpdate) + +Problems: + +> If somehow an inconsistent state gets recorded in etcd, then you're permanently stuck here. And the failure mode is really bad (can't make any more services) and requires etcd-level access to fix. So, this is not a workable solution, I think. + #### Alternative 4 - +From Antonio Ojea -## Infrastructure Needed (Optional) +> instead of using a bitmap allocator I've created a new API object that has a set of IPs, so the IPs are tracked in the API object - +> // IPRangeSpec defines the desired state of IPRange +> type IPRangeSpec struct { +> // Range represent the IP range in CIDR format +> // i.e. 10.0.0.0/16 or 2001:db2::/64 +> // +kubebuilder:validation:MaxLength=128 +> // +kubebuilder:validation:MinLength=8 +> Range string `json:"range,omitempty"` + +> // +optional +> // Addresses represent the IP addresses of the range and its status. +> // Each address may be associated to one kubernetes object (i.e. Services) +> // +listType=set +> Addresses []string `json:"addresses,omitempty"` +>} + +> // IPRangeStatus defines the observed state of IPRange +> type IPRangeStatus struct { +> // Free represent the number of IP addresses that are not allocated in the Range +> // +optional +> Free int64 `json:"free,omitempty"` + + + +Problems: Updates to the set within the object are problematic and difficult to handle. diff --git a/keps/sig-network/1880-multiple-service-cidrs/kep.yaml b/keps/sig-network/1880-multiple-service-cidrs/kep.yaml index 801c4f6aca2..7d7cc76c582 100644 --- a/keps/sig-network/1880-multiple-service-cidrs/kep.yaml +++ b/keps/sig-network/1880-multiple-service-cidrs/kep.yaml @@ -20,18 +20,18 @@ see-also: replaces: # The target maturity stage in the current dev cycle for this KEP. -stage: beta +stage: stable # The most recent milestone for which work toward delivery of this KEP has been # done. This can be the current (upcoming) milestone, if it is being actively # worked on. -latest-milestone: "v1.31" +latest-milestone: "v1.33" # The milestone at which this feature was, or is targeted to be, at each stage. milestone: alpha: "v1.27" beta: "v1.31" - stable: + stable: "v1.33" # The following PRR answers are required at alpha release # List the feature gate name and the components for which it must be enabled