From 1e8dbd5bf40d1bca029fd008df0763ca34e6cd8c Mon Sep 17 00:00:00 2001 From: Michael Hrivnak Date: Mon, 26 Aug 2024 08:50:45 -0400 Subject: [PATCH] in-cluster DNS and load balancers on more platforms proposes adding in-cluster DNS and load balancers as an option for more platform types than just the current `baremetal` and `openstack`. --- ...dns-and-loadbalancers-on-more-platforms.md | 320 ++++++++++++++++++ 1 file changed, 320 insertions(+) create mode 100644 enhancements/network/in-cluster-dns-and-loadbalancers-on-more-platforms.md diff --git a/enhancements/network/in-cluster-dns-and-loadbalancers-on-more-platforms.md b/enhancements/network/in-cluster-dns-and-loadbalancers-on-more-platforms.md new file mode 100644 index 0000000000..3ee2e5e390 --- /dev/null +++ b/enhancements/network/in-cluster-dns-and-loadbalancers-on-more-platforms.md @@ -0,0 +1,320 @@ +--- +title: in-cluster-dns-and-loadbalancers-on-more-platforms +authors: + - "@mhrivnak" + - "@eranco74" +reviewers: # Include a comment about what domain expertise a reviewer is expected to bring and what area of the enhancement you expect them to focus on. For example: - "@networkguru, for networking aspects, please look at IP bootstrapping aspect" + - "@cybertron" + - "@tsorya" + - "@zaneb" +approvers: # A single approver is preferred, the role of the approver is to raise important questions, help ensure the enhancement receives reviews from all applicable areas/SMEs, and determine when consensus is achieved such that the EP can move forward to implementation. Having multiple approvers makes it difficult to determine who is responsible for the actual approval. + - TBD +api-approvers: # In case of new or modified APIs or API extensions (CRDs, aggregated apiservers, webhooks, finalizers). If there is no API change, use "None" + - TBD +creation-date: 2024-08-26 +last-updated: 2024-08-26 +tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement +see-also: + - "/enhancements/network/baremetal-networking.md" +replaces: +superseded-by: +--- + +# In-cluster DNS and load balancers on more platforms + +## Summary + +Multiple on-prem platform types, including `baremetal` and `openstack`, +[provide in-cluster implementations of network +services](https://github.com/openshift/installer/blob/master/docs/design/baremetal/networking-infrastructure.md) +that are required in order to have a viable stand-alone cluster: + +* CoreDNS for in-cluster DNS resolution +* haproxy with keepalived to provide in-cluster load balancers (ingress) for the API server and workloads + +Continuing the work from that original [enhancement +proposal](https://github.com/openshift/enhancements/blob/master/enhancements/network/baremetal-networking.md), +those services should also be available for optional inclusion when installing +a cluster with the `external` or `none` platform types, which are likewise +often used in environments that lack a suitable alternative for DNS and/or load +balancers. + +## Motivation + +Provisioning and configuring a [DNS +system](https://docs.openshift.com/container-platform/4.16/installing/installing_platform_agnostic/installing-platform-agnostic.html#installation-dns-user-infra_installing-platform-agnostic) +and [load +balancers](https://docs.openshift.com/container-platform/4.16/installing/installing_platform_agnostic/installing-platform-agnostic.html#installation-load-balancing-user-infra_installing-platform-agnostic) +manually for an OpenShift cluster is a substantial burden on the user. In a +cloud environment that's not already supported by the OpenShift installer, +utilizing the native offerings requires additional work up-front, creates an +ongoing maintenance burden, and more monetary cost for infrastructure. And not +all cloud environments offer suitable options. On-prem, there may not exist +sufficient DNS and/or load balancer services nor additional infrastructure on +which to run them. + +Many users end up deploying a cluster with platform type `baremetal` when all +they really want is to use the in-cluster network services. Instead, it should +be possible to utilize those in-cluster network services without them being +coupled to the baremetal platform type. + +For example, the assisted-installer and the agent based installer are often +used to deploy clusters into “generic” environments where there is not an +opportunity to utilize an external DNS or load balancer solution. Thus the +assisted-installer sets the platform type as `baremetal` regardless of whether +the systems are actually running on bare metal or whether there is any intent +to use metal3 integrations. The resulting cluster has all of the appearances of +being bare metal, including a BareMetalHost resource for each Node, which can +be confusing to users. Even the web console’s Overview landing page shows +“Infrastructure provider: BareMetal” in addition to “N Nodes” and “N Bare Metal +Hosts”. + +Single Node OpenShift uses platform type `none` and [requires the user to +configure DNS records +manually](https://docs.openshift.com/container-platform/4.16/installing/installing_sno/install-sno-installing-sno.html#install-sno-installing-sno-manually). +When using the assisted-installer, it configures dnsmasq in new SNO clusters as +a convenience. But it would be better for the internal DNS service to be a +native part of platform type `none` so that it is easily available to all +users, regardless of how they are installing SNO. + +### User Stories + +As a user deploying OpenShift in an environment that lacks a suitable DNS +and/or load balancer solution, and with no intent to utilize metal3-related +bare metal features, I want to utilize the in-cluster network services without +being forced to use the `baremetal` platform type. + +As a user deploying OpenShift with the `external` platform type into an +environment of my choosing, I want the option to use the in-cluster network +services because they are easier to use than manually deploying, configuring +and managing the alternatives that may be natively available in the +environment. + +As a user deploying Single Node OpenShift, I want the convenience of a +cluster-internal DNS solution. + +As a user deploying OpenShift in a mixed environment, such as [virtualized +control plane nodes and bare metal worker +nodes](https://access.redhat.com/solutions/5376701), I am forced to select +platform type `none`, but I still want the option to use the in-cluster network +services. + +As a developer enabling OpenShift on a new platform via the `external` platform +type, I want to get an OpenShift cluster up and running with as little friction +as possible so I can start adding integrations with features of the +environment. + +### Goals + +Enable stand-alone OpenShift clusters to be viable out-of-the-box in +environments that A) lack a suitable external DNS and/or load balancer +solution, and B) are not one of the platform types that already provide those +services in-cluster (`baremetal`, `openstack`, an `ovirt`). + +Allow users to opt-in for in-cluster DNS and load balancer services with +platform types `none` and `external`. + +Stop requiring users to select the `baremetal` platform type when all they +really want is the in-cluster DNS and load balancer services. + +Make it easy for Single Node OpenShift users to deploy the cluster-internal DNS +service. + +### Non-Goals + +The in-cluster network infrastructure has a limitation that it requires nodes +to be on the same subnet. This proposal does not seek to change or remove that +limitation. + +## Proposal + +The install-config.yaml platform section for both `external` and `none` will +include optional fields to deploy and configure coredns and/or the in-cluster +load balancers. Actual deployment and management will be handled the same way +it already is on other platforms. + +### Workflow Description + +A user or automation tool (such as the assisted-installer) that is editing +install-config.yaml prior to cluster installation will be able to: +* Enable internal DNS +* Provide VIPs that implicitly enable in-cluster load balancers + +### API Extensions + +In the `InstallConfig`, the sections for External and None platforms will have +new settings that: +* Enable internal DNS +* Provide VIPs that implicitly enable internal load balancers (See example under Implementation Details) + +The `Infrastructure` API will add fields in the +[`PlatformSpec`](https://github.com/openshift/api/blob/ef419b6/config/v1/types_infrastructure.go#L272) +and +[`PlatformStatus`](https://github.com/openshift/api/blob/ef419b6/config/v1/types_infrastructure.go#L389) +that mirror the corresponding fields for baremetal, including: +* `APIServerInternalIPs` in Spec and Status +* `IngressIPs` in Spec and Status +* `LoadBalancer` in Status + +Those fields will be added to the `External` platform Spec and Status. For the +`None` platform, a new Spec and Status section will need to be created. + +### Topology Considerations + +#### Hypershift / Hosted Control Planes + +None + +#### Standalone Clusters + +The change is only relevant for standalone clusters. + +#### Single-node Deployments or MicroShift + +Single Node OpenShift benefits from this change as described above. Being a +single node, it does not need the loadbalancers, but it does require a DNS +solution. + +Assisted-installer already deploys dnsmasq by default as a cluster-internal DNS +solution for SNO, which has been valuable and successful. + +### Implementation Details/Notes/Constraints + +The `InstallConfig` will gain new settings for `InClusterLoadBalancer` and +`InternalDNS`. They are shown below, added to the existing settings for the +External platform type. + +``` +type InClusterLoadBalancer struct { + // APIVIPs contains the VIP(s) to use for internal API communication. In + // dual stack clusters it contains an IPv4 and IPv6 address, otherwise only + // one VIP + // + // +kubebuilder:validation:MaxItems=2 + // +kubebuilder:validation:UniqueItems=true + // +kubebuilder:validation:Format=ip + APIVIPs []string `json:"apiVIPs,omitempty"` + + // IngressVIPs contains the VIP(s) to use for ingress traffic. In dual stack + // clusters it contains an IPv4 and IPv6 address, otherwise only one VIP + // + // +kubebuilder:validation:MaxItems=2 + // +kubebuilder:validation:UniqueItems=true + // +kubebuilder:validation:Format=ip + IngressVIPs []string `json:"ingressVIPs,omitempty"` +} + + +// Platform stores configuration related to external cloud providers. +type Platform struct { + // PlatformName holds the arbitrary string representing the infrastructure + // provider name, expected to be set at the installation time. This field + // is solely for informational and reporting purposes and is not expected + // to be used for decision-making. + // +kubebuilder:default:="Unknown" + // +default="Unknown" + // +kubebuilder:validation:XValidation:rule="oldSelf == 'Unknown' || self == oldSelf",message="platform name cannot be changed once set" + // +optional + PlatformName string `json:"platformName,omitempty"` + + // CloudControllerManager when set to external, this property will enable + // an external cloud provider. + // +kubebuilder:default:="" + // +default="" + // +kubebuilder:validation:Enum="";External + // +optional + CloudControllerManager CloudControllerManager `json:"cloudControllerManager,omitempty"` + + // InClusterLoadBalancer is an optional feature that uses haproxy and + // keepalived as loadbalancers running in the cluster. Is is useful in + // environments where it is not possible or desirable to use loadbalancers outside + // of the cluster. + // +optional + InClusterLoadBalancer *InClusterLoadBalancer `json:"inClusterLoadBalancer,omitempty"` + + // InternalDNS, when set, activates a DNS service running inside the cluster + // to provide DNS resolution internally. It is useful in environments where + // it is not possible or desirable to manage the cluster's internal DNS + // records in an external DNS system. + // +kubebuilder:default:="" + // +default="" + // +kubebuilder:validation:Enum="";CoreDNS + // +optional + InternalDNS InternalDNS `json:"internalDNS,omitempty"` +} + +type InternalDNSType string + +const ( + // CoreDNS is the default service used to implement internal DNS within a cluster. + CoreDNS InternalDNS = "CoreDNS" +) +``` + +### Risks and Mitigations + +All of the components in question are already widely deployed in OpenShift +clusters. + +### Drawbacks + + +## Open Questions [optional] + + +## Test Plan + +**Note:** *Section not required until targeted at a release.* + +## Graduation Criteria + +**Note:** *Section not required until targeted at a release.* + +### Dev Preview -> Tech Preview + + +### Tech Preview -> GA + + +### Removing a deprecated feature + + +## Upgrade / Downgrade Strategy + +No change to how the components are upgraded and/or downgraded today. + +## Version Skew Strategy + +No change. + +## Operational Aspects of API Extensions + +This proposal will enable clusters installed in the future to have fewer CRDs, +since they'll be able to use in-cluster network services without having to +select the `baremetal` platform type. Thus, the unused CRDs from the +`baremetal` platform won't be present on those clusters. + +## Support Procedures + +No new support implications. + +## Alternatives + +### Move In-Cluster Network Settings Out of the Platform Spec + +Instead of embedding the in-cluster network settings within the platform +specification, these settings could be moved to a separate, dedicated section +in the install-config.yaml. This approach would completely decouple the setup +of these in-cluster network services from platform-specific settings, allowing +greater flexibility in utilizing the network services on any platform type. + +Pros: +* Decouples the in-cluster network services from platform-specific settings. +* Simplifies the platform specification and provides a clear, dedicated section for network services. +* De-duplicates settings that have been mirrored into `baremetal`, `openstack`, and `ovirt` platforms. + +Cons: +* Introduces a new section in the configuration file, which may confuse users. +* May conflict with the settings for these network services that already exist on specific platforms, including baremetal and openstack. +* Would require guardrails to ensure they don’t get deployed with platforms that utilize other solutions, even if the user configures them for deployment in the install-config. +