-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPEDGE-1191: feat: initial arbiter cluster enhancement #1674
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,269 @@ | ||||||
--- | ||||||
title: arbiter-clusters | ||||||
authors: | ||||||
- "@eggfoobar" | ||||||
reviewers: | ||||||
- "@tjungblu" | ||||||
- "@patrickdillon" | ||||||
- "@williamcaban" | ||||||
- "@deads2k" | ||||||
- "@jerpeter1" | ||||||
approvers: | ||||||
- "@tjungblu" | ||||||
- "@patrickdillon" | ||||||
- "@williamcaban" | ||||||
- "@jerpeter1" | ||||||
- "@deads2k" | ||||||
api-approvers: | ||||||
- "@JoelSpeed" | ||||||
creation-date: 2024-08-27 | ||||||
last-updated: 2024-08-27 | ||||||
tracking-link: | ||||||
- https://issues.redhat.com/browse/OCPEDGE-1191 | ||||||
see-also: [] | ||||||
replaces: [] | ||||||
superseded-by: [] | ||||||
--- | ||||||
|
||||||
# Support 2 Node + 1 Arbiter Node HA Cluster | ||||||
|
||||||
## Summary | ||||||
|
||||||
This enhancement describes an ability to install OpenShift with a control plane | ||||||
that consists of 2 normal sized nodes, and 1 node that can be less powerful than | ||||||
the recommended node size. This 1 arbiter node will only be running critical | ||||||
components for maintaining HA to allow the arbiter node size to be as small and | ||||||
as low cost as possible with in reason. | ||||||
|
||||||
## Motivation | ||||||
|
||||||
Customers at the edge are requiring a more economical solution for HA | ||||||
deployments at the edge. They can support running 2 node clusters for redundancy | ||||||
but would like the option to deploy a lower cost node as an arbiter to supply | ||||||
the 3 nodes for ETCD quorum. | ||||||
|
||||||
### User Stories | ||||||
|
||||||
- As a solutions architect for a retail organization, I want to deploy OpenShift | ||||||
at n number of store locations at the edge with only 2 regular sized nodes and | ||||||
1 lower cost node to maintain HA and keep compute costs down. | ||||||
- As a solutions architect for cloud infrastructures, I want to offer low cost | ||||||
OpenShift deployments on purpose built hardware for a 2 + 1 configuration. | ||||||
- As an OpenShift cluster admin I want non-critical applications deployed to my | ||||||
2 + 1 arbiter node cluster to not be scheduled to run on the arbiter node. | ||||||
|
||||||
### Goals | ||||||
|
||||||
- Provide a new node arbiter role type that supports HA but is not a full master | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does "supports HA" mean? |
||||||
- Support installing OpenShift with 2 regular nodes and 1 arbiter node. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- The arbiter node hardware requirement will be lower than regular nodes. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about including hardware example from Daniel’s presentation in order to give an idea where this might fit? |
||||||
|
||||||
### Non-Goals | ||||||
|
||||||
The below goals are not intended to be worked on now, but might be expansion | ||||||
ideas for future features. | ||||||
|
||||||
- Running the arbiter node offsite | ||||||
- Running the arbiter node as a VM local to the cluster | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
This suggestion is a little nitpicky and could be ignored if it doesn't make sense. I interpreted this non-goal to mean that we're not intending to support a virtualized arbiter node in any capacity; from within the cluster, adjacent to the cluster, or remote to the cluster. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should not exclude running the arbiter node on a hypervisor. There might be situations where this would actually be helpful. I think the key point is really not to run the arbiter on OCPVirt on the same cluster, as this spoils the idea of 3 node redundancy. |
||||||
- Having a single arbiter supporting multiple clusters | ||||||
- Moving from 2 + 1 to conventional 3 node cluster | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are we stating this as a non-goal? The JIRA feature asks for this (see requirement number 6 in the description) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As i know, changing topology mode is not allowed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the etcd / ocp topology would not change to my understanding - it remains a 3 node cluster. |
||||||
|
||||||
## Proposal | ||||||
|
||||||
The main focus of the enhancement is to support edge deployments of individual | ||||||
OpenShift HA clusters at scale, and to do so in a cost effective way. We are | ||||||
proposing doing this through the incorporation of an arbiter node as a quasi | ||||||
heterogenous control plane configuration. The arbiter will run the critical | ||||||
components that help maintain an HA cluster, but other platform pods should not | ||||||
be scheduled on the arbiter node. The arbiter node will be tainted to make sure | ||||||
that only deployments that tolerate that taint are scheduled on the arbiter. | ||||||
|
||||||
Things that we are proposing of changing. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
- Adding a new topology to the [OCP/API Control Plane | ||||||
Topology](https://github.com/openshift/api/blob/69df64132c911e9eb0176e9697f451c13457e294/config/v1/types_infrastructure.go#L103) | ||||||
- This type of change should have an authoritative flag that indicates layout | ||||||
of the control plane, this information would be valuable for operator | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
developers so no inference is required. | ||||||
- We will add support to the OCP installer to provide a way of setting up the | ||||||
initial manifests and the ControlPlaneTopology field. | ||||||
- We will need to support a path for customers to indicate the desire for a 2 | ||||||
- 1 arbiter install configuration. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- This will also be used to apply the taint to the machineset manifest. | ||||||
- Alter CEO to be aware of the arbiter node role type and allow it to treat it | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
as if it were a master node. | ||||||
- We will need CEO to create an ETCD member on the arbiter node to allow | ||||||
quarom to happen | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- Update the tolerations of any critical or desired component that should be | ||||||
running on the arbiter node. | ||||||
|
||||||
### Workflow Description | ||||||
|
||||||
#### For Cloud Installs | ||||||
|
||||||
1. User sits down at the computer. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about Gitops installations? Just kidding :-) This can be dropped IMO. The location of the cluster admin is an implementation detail. |
||||||
2. The user creates an `install-config.yaml` like normal. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
3. The user defines the `install-config.controlPlane` field with `3` replicas. | ||||||
4. The user then enters the new field `install-config.controlPlane.arbiterNode` | ||||||
and sets it to `true` | ||||||
5. The user generates the manifests with this install config via the | ||||||
`openshift-install create manifests` | ||||||
6. With the flag `arbiterNode` in the install config, the installer adds the | ||||||
`ControlPlaneTopology: ArbiterHighlyAvailable` to the infrastructure config | ||||||
object. | ||||||
7. The installer creates a new `arbiter` MachineSet with a replica of 1 and | ||||||
reduces the default control plane replicas to `2` | ||||||
8. The installer applies the new node role and taint to the arbiter MachineSet | ||||||
9. The user can make any alterations to the node machine type to use less | ||||||
powerful machines. | ||||||
10. The user then begins the install via `openshift-install create cluster` | ||||||
|
||||||
#### For Baremetal Installs | ||||||
|
||||||
1. User sits down at the computer. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above - this line can be dropped. |
||||||
2. The user creates an `install-config.yaml` like normal. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
3. The user defines the `install-config.controlPlane` field with `3` replicas. | ||||||
4. The user then enters the new field `install-config.controlPlane.arbiterNode` | ||||||
and sets it to `true` | ||||||
5. The user then enters the machine information for `platform.baremetal` and | ||||||
identifies one of the nodes as a role `arbiter` | ||||||
6. With the flag `arbiterNode` in the install config, the installer adds the | ||||||
`ControlPlaneTopology: ArbiterHighlyAvailable` to the infrastructure config | ||||||
object. | ||||||
7. The user then begins the install via `openshift-install create cluster` | ||||||
|
||||||
#### During Install | ||||||
|
||||||
1. The CEO will watch for new masters and the arbiter role | ||||||
2. CEO will create the operand for the etcd deployments that have tolerations | ||||||
for the arbiter | ||||||
3. Operators that have tolerations for the arbiter should be scheduled on the | ||||||
node | ||||||
4. The install should proceed as normal | ||||||
|
||||||
### API Extensions | ||||||
|
||||||
The `config.infrastructure.controlPlaneTopology` enum will be extended to | ||||||
include `ArbiterHighlyAvailable` | ||||||
|
||||||
### Topology Considerations | ||||||
|
||||||
#### Hypershift / Hosted Control Planes | ||||||
|
||||||
At the time being there is no impact on Hypershift since this edge deployment | ||||||
will require running the control plane. | ||||||
|
||||||
#### Standalone Clusters | ||||||
|
||||||
This change is relevant to standalone deployments of OpenShift at the edge or | ||||||
datacenters. This enhancement specifically deals with this type of deployment. | ||||||
|
||||||
#### Single-node Deployments or MicroShift | ||||||
|
||||||
This change does not effect Single Node or MicroShift. | ||||||
|
||||||
### Implementation Details/Notes/Constraints | ||||||
|
||||||
Currently there are some behavior unknowns, we will need to put out a POC to | ||||||
validate some of the desires in this proposal. In it's current version this | ||||||
proposal is not exhaustive but will be filled out as we implement these goals. | ||||||
|
||||||
We currently expect this feature to mainly be used by `baremetal` installs, or | ||||||
specialized hardware that is built to take advantage of this type of | ||||||
configuration. In the current design we make two paths for cloud and baremetal | ||||||
installs in the installer. However, the cloud install is primarily for testing, | ||||||
this might mean that we simplify the installer changes if we are the only ones | ||||||
using cloud installs, since we can simply alter the manifests in the pipeline | ||||||
with out needing to change the installer. | ||||||
|
||||||
### Risks and Mitigations | ||||||
|
||||||
The main risk in this enhancement is that because we are treating one of the | ||||||
master nodes in a 3 node cluster as an arbiter, we are explicitly evicting | ||||||
processes that would otherwise be a normal supported upstream configuration such | ||||||
as a compact cluster. We run the risk of new components being critical to HA not | ||||||
containing the proper tolerations for running on the arbiter node. One of the | ||||||
mitigations we can take against that is to make sure we are testing installs and | ||||||
updates. | ||||||
|
||||||
Another risk we run is customers using an arbiter node with improper disk speeds | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm leaning towards suggesting that this risk be removed since we document etcd disk best practices so they'd also apply here. With that in mind, I don't think this risk is specific to this enhancement. If anyone else wants to give this idea a 👍 then maybe it can be dropped otherwise it can be left as-is. |
||||||
below that recommended for etcd, since etcd is sensitive to latency between | ||||||
members, we should provide proper guidance so that the arbiter node doesn't | ||||||
become a bottleneck for etcd. | ||||||
|
||||||
### Drawbacks | ||||||
|
||||||
A few drawbacks we have is that we will be creating a new variant of OpenShift | ||||||
that implements a new unique way of doing HA for kubernetes. This does mean an | ||||||
increase in the test matrix and all together a different type of tests since | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
since what? |
||||||
|
||||||
## Open Questions [optional] | ||||||
|
||||||
1. In the future it might be desired to add another master and convert to a | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As i know, changing topology mode is not allowed |
||||||
compact cluster, do we want to support changing ControlPlaneTopology field | ||||||
after the fact? | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see above, this is already listed in the requirements and should be possible. Maybe not on initial release, but then on the next one. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Feature Requirement number 8 states: |
||||||
|
||||||
## Test Plan | ||||||
|
||||||
WIP | ||||||
|
||||||
- Running e2e test would be preferred but might prove to be tricky due to the | ||||||
asymmetry in the control plane | ||||||
- We need a strategy for validating install and test failures | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We may need to modify a lot of the tests in origin to account for this new configuration or add a whole new test suite to accommodate it |
||||||
|
||||||
## Graduation Criteria | ||||||
|
||||||
### Dev Preview -> Tech Preview | ||||||
|
||||||
- Ability to utilize the enhancement end to end | ||||||
- End user documentation, relative API stability | ||||||
- Sufficient test coverage | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do we define sufficient here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Whats the reason we need a Dev Preview? Could we start with a Tech Preview right away, to reduce potential time to market? We could extend TP phase if required. |
||||||
- Gather feedback from users rather than just developers | ||||||
- Enumerate service level indicators (SLIs), expose SLIs as metrics | ||||||
- Write symptoms-based alerts for the component(s) | ||||||
|
||||||
### Tech Preview -> GA | ||||||
|
||||||
- More testing (upgrade, downgrade, scale) | ||||||
- Sufficient time for feedback | ||||||
- Available by default | ||||||
- Backhaul SLI telemetry | ||||||
- Document SLOs for the component | ||||||
- Conduct load testing | ||||||
- User facing documentation created in | ||||||
[openshift-docs](https://github.com/openshift/openshift-docs/) | ||||||
|
||||||
**For non-optional features moving to GA, the graduation criteria must include | ||||||
end to end tests.** | ||||||
|
||||||
### Removing a deprecated feature | ||||||
|
||||||
N/A | ||||||
|
||||||
## Upgrade / Downgrade Strategy | ||||||
|
||||||
WIP | ||||||
|
||||||
## Version Skew Strategy | ||||||
|
||||||
N/A | ||||||
|
||||||
## Operational Aspects of API Extensions | ||||||
|
||||||
WIP | ||||||
|
||||||
## Support Procedures | ||||||
|
||||||
WIP | ||||||
|
||||||
## Alternatives | ||||||
|
||||||
We originally had tried using the pre-existing features in OCP, such as setting | ||||||
a node as NoSchedule to avoid customer workloads going on the arbiter node. | ||||||
While this whole worked as expected, the problem we faced is that the desire is | ||||||
to use a very lower powered and cheap device as the arbiter, this method would | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
still run a lot of the overhead on the arbiter node. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
## Infrastructure Needed [optional] | ||||||
|
||||||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
william is no longer involved in controlplane, please tag Ramon Acedo