Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPEDGE-1191: feat: initial arbiter cluster enhancement #1674

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
269 changes: 269 additions & 0 deletions enhancements/arbiter-clusters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,269 @@
---
title: arbiter-clusters
authors:
- "@eggfoobar"
reviewers:
- "@tjungblu"
- "@patrickdillon"
- "@williamcaban"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

william is no longer involved in controlplane, please tag Ramon Acedo

- "@deads2k"
- "@jerpeter1"
approvers:
- "@tjungblu"
- "@patrickdillon"
- "@williamcaban"
- "@jerpeter1"
- "@deads2k"
api-approvers:
- "@JoelSpeed"
creation-date: 2024-08-27
last-updated: 2024-08-27
tracking-link:
- https://issues.redhat.com/browse/OCPEDGE-1191
see-also: []
replaces: []
superseded-by: []
---

# Support 2 Node + 1 Arbiter Node HA Cluster

## Summary

This enhancement describes an ability to install OpenShift with a control plane
that consists of 2 normal sized nodes, and 1 node that can be less powerful than
the recommended node size. This 1 arbiter node will only be running critical
components for maintaining HA to allow the arbiter node size to be as small and
as low cost as possible with in reason.

## Motivation

Customers at the edge are requiring a more economical solution for HA
deployments at the edge. They can support running 2 node clusters for redundancy
but would like the option to deploy a lower cost node as an arbiter to supply
the 3 nodes for ETCD quorum.

### User Stories

- As a solutions architect for a retail organization, I want to deploy OpenShift
at n number of store locations at the edge with only 2 regular sized nodes and
1 lower cost node to maintain HA and keep compute costs down.
- As a solutions architect for cloud infrastructures, I want to offer low cost
OpenShift deployments on purpose built hardware for a 2 + 1 configuration.
- As an OpenShift cluster admin I want non-critical applications deployed to my
2 + 1 arbiter node cluster to not be scheduled to run on the arbiter node.

### Goals

- Provide a new node arbiter role type that supports HA but is not a full master

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Provide a new node arbiter role type that supports HA but is not a full master
- Provide a new arbiter node role type that achieves HA but does not act as a full master node

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "supports HA" mean?

- Support installing OpenShift with 2 regular nodes and 1 arbiter node.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Support installing OpenShift with 2 regular nodes and 1 arbiter node.
- Support installing OpenShift with 2 master nodes and 1 arbiter node.

- The arbiter node hardware requirement will be lower than regular nodes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The arbiter node hardware requirement will be lower than regular nodes.
- The arbiter node hardware requirements will be lower than regular nodes in both cost and performance.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about including hardware example from Daniel’s presentation in order to give an idea where this might fit?


### Non-Goals

The below goals are not intended to be worked on now, but might be expansion
ideas for future features.

- Running the arbiter node offsite
- Running the arbiter node as a VM local to the cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Running the arbiter node as a VM local to the cluster
- Running a virtualized arbiter node.

This suggestion is a little nitpicky and could be ignored if it doesn't make sense. I interpreted this non-goal to mean that we're not intending to support a virtualized arbiter node in any capacity; from within the cluster, adjacent to the cluster, or remote to the cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not exclude running the arbiter node on a hypervisor. There might be situations where this would actually be helpful. I think the key point is really not to run the arbiter on OCPVirt on the same cluster, as this spoils the idea of 3 node redundancy.

- Having a single arbiter supporting multiple clusters
- Moving from 2 + 1 to conventional 3 node cluster

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Moving from 2 + 1 to conventional 3 node cluster
- Moving from 2 + 1 to a conventional 3 node cluster

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we stating this as a non-goal? The JIRA feature asks for this (see requirement number 6 in the description)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As i know, changing topology mode is not allowed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the etcd / ocp topology would not change to my understanding - it remains a 3 node cluster.


## Proposal

The main focus of the enhancement is to support edge deployments of individual
OpenShift HA clusters at scale, and to do so in a cost effective way. We are
proposing doing this through the incorporation of an arbiter node as a quasi
heterogenous control plane configuration. The arbiter will run the critical
components that help maintain an HA cluster, but other platform pods should not
be scheduled on the arbiter node. The arbiter node will be tainted to make sure
that only deployments that tolerate that taint are scheduled on the arbiter.

Things that we are proposing of changing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Things that we are proposing of changing.
Functionality that we are proposing to change:


- Adding a new topology to the [OCP/API Control Plane
Topology](https://github.com/openshift/api/blob/69df64132c911e9eb0176e9697f451c13457e294/config/v1/types_infrastructure.go#L103)
- This type of change should have an authoritative flag that indicates layout
of the control plane, this information would be valuable for operator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
of the control plane, this information would be valuable for operator
of the control plane which would be valuable for operator

developers so no inference is required.
- We will add support to the OCP installer to provide a way of setting up the
initial manifests and the ControlPlaneTopology field.
- We will need to support a path for customers to indicate the desire for a 2
- 1 arbiter install configuration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- 1 arbiter install configuration.
+ 1 arbiter install configuration.

- This will also be used to apply the taint to the machineset manifest.
- Alter CEO to be aware of the arbiter node role type and allow it to treat it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Alter CEO to be aware of the arbiter node role type and allow it to treat it
- Alter Cluster ETCD Operator (CEO) to be aware of the arbiter node role type and allow it to treat it

as if it were a master node.
- We will need CEO to create an ETCD member on the arbiter node to allow
quarom to happen

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
quarom to happen
quarum to happen

- Update the tolerations of any critical or desired component that should be
running on the arbiter node.

### Workflow Description

#### For Cloud Installs

1. User sits down at the computer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Gitops installations? Just kidding :-)

This can be dropped IMO. The location of the cluster admin is an implementation detail.

2. The user creates an `install-config.yaml` like normal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. The user creates an `install-config.yaml` like normal.
2. The user creates an `install-config.yaml`.

3. The user defines the `install-config.controlPlane` field with `3` replicas.
4. The user then enters the new field `install-config.controlPlane.arbiterNode`
and sets it to `true`
5. The user generates the manifests with this install config via the
`openshift-install create manifests`
6. With the flag `arbiterNode` in the install config, the installer adds the
`ControlPlaneTopology: ArbiterHighlyAvailable` to the infrastructure config
object.
7. The installer creates a new `arbiter` MachineSet with a replica of 1 and
reduces the default control plane replicas to `2`
8. The installer applies the new node role and taint to the arbiter MachineSet
9. The user can make any alterations to the node machine type to use less
powerful machines.
10. The user then begins the install via `openshift-install create cluster`

#### For Baremetal Installs

1. User sits down at the computer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - this line can be dropped.

2. The user creates an `install-config.yaml` like normal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. The user creates an `install-config.yaml` like normal.
2. The user creates an `install-config.yaml`.

3. The user defines the `install-config.controlPlane` field with `3` replicas.
4. The user then enters the new field `install-config.controlPlane.arbiterNode`
and sets it to `true`
5. The user then enters the machine information for `platform.baremetal` and
identifies one of the nodes as a role `arbiter`
6. With the flag `arbiterNode` in the install config, the installer adds the
`ControlPlaneTopology: ArbiterHighlyAvailable` to the infrastructure config
object.
7. The user then begins the install via `openshift-install create cluster`

#### During Install

1. The CEO will watch for new masters and the arbiter role
2. CEO will create the operand for the etcd deployments that have tolerations
for the arbiter
3. Operators that have tolerations for the arbiter should be scheduled on the
node
4. The install should proceed as normal

### API Extensions

The `config.infrastructure.controlPlaneTopology` enum will be extended to
include `ArbiterHighlyAvailable`

### Topology Considerations

#### Hypershift / Hosted Control Planes

At the time being there is no impact on Hypershift since this edge deployment
will require running the control plane.

#### Standalone Clusters

This change is relevant to standalone deployments of OpenShift at the edge or
datacenters. This enhancement specifically deals with this type of deployment.

#### Single-node Deployments or MicroShift

This change does not effect Single Node or MicroShift.

### Implementation Details/Notes/Constraints

Currently there are some behavior unknowns, we will need to put out a POC to
validate some of the desires in this proposal. In it's current version this
proposal is not exhaustive but will be filled out as we implement these goals.

We currently expect this feature to mainly be used by `baremetal` installs, or
specialized hardware that is built to take advantage of this type of
configuration. In the current design we make two paths for cloud and baremetal
installs in the installer. However, the cloud install is primarily for testing,
this might mean that we simplify the installer changes if we are the only ones
using cloud installs, since we can simply alter the manifests in the pipeline
with out needing to change the installer.

### Risks and Mitigations

The main risk in this enhancement is that because we are treating one of the
master nodes in a 3 node cluster as an arbiter, we are explicitly evicting
processes that would otherwise be a normal supported upstream configuration such
as a compact cluster. We run the risk of new components being critical to HA not
containing the proper tolerations for running on the arbiter node. One of the
mitigations we can take against that is to make sure we are testing installs and
updates.

Another risk we run is customers using an arbiter node with improper disk speeds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards suggesting that this risk be removed since we document etcd disk best practices so they'd also apply here. With that in mind, I don't think this risk is specific to this enhancement. If anyone else wants to give this idea a 👍 then maybe it can be dropped otherwise it can be left as-is.

below that recommended for etcd, since etcd is sensitive to latency between
members, we should provide proper guidance so that the arbiter node doesn't
become a bottleneck for etcd.

### Drawbacks

A few drawbacks we have is that we will be creating a new variant of OpenShift
that implements a new unique way of doing HA for kubernetes. This does mean an
increase in the test matrix and all together a different type of tests since

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and all together a different type of tests since

since what?


## Open Questions [optional]

1. In the future it might be desired to add another master and convert to a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As i know, changing topology mode is not allowed

compact cluster, do we want to support changing ControlPlaneTopology field
after the fact?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above, this is already listed in the requirements and should be possible. Maybe not on initial release, but then on the next one.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature Requirement number 8 states:
It must be possible to explicitly schedule additional workload to the arbiter node. That is important for 3d party solutions (e.g. storage provider) which also have quorum based mechanisms.
The use case is e.g. ODF in replica-2 mode, where some ODF components also need 3 deployments (the ceph mon for example).
How is this supported by this ER? I think we should mention / describe this case.


## Test Plan

WIP

- Running e2e test would be preferred but might prove to be tricky due to the
asymmetry in the control plane
- We need a strategy for validating install and test failures

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to modify a lot of the tests in origin to account for this new configuration or add a whole new test suite to accommodate it


## Graduation Criteria

### Dev Preview -> Tech Preview

- Ability to utilize the enhancement end to end
- End user documentation, relative API stability
- Sufficient test coverage

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we define sufficient here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the reason we need a Dev Preview? Could we start with a Tech Preview right away, to reduce potential time to market? We could extend TP phase if required.

- Gather feedback from users rather than just developers
- Enumerate service level indicators (SLIs), expose SLIs as metrics
- Write symptoms-based alerts for the component(s)

### Tech Preview -> GA

- More testing (upgrade, downgrade, scale)
- Sufficient time for feedback
- Available by default
- Backhaul SLI telemetry
- Document SLOs for the component
- Conduct load testing
- User facing documentation created in
[openshift-docs](https://github.com/openshift/openshift-docs/)

**For non-optional features moving to GA, the graduation criteria must include
end to end tests.**

### Removing a deprecated feature

N/A

## Upgrade / Downgrade Strategy

WIP

## Version Skew Strategy

N/A

## Operational Aspects of API Extensions

WIP

## Support Procedures

WIP

## Alternatives

We originally had tried using the pre-existing features in OCP, such as setting
a node as NoSchedule to avoid customer workloads going on the arbiter node.
While this whole worked as expected, the problem we faced is that the desire is
to use a very lower powered and cheap device as the arbiter, this method would

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to use a very lower powered and cheap device as the arbiter, this method would
to use a device that is lower power and is cheaper as the arbiter. This method would

still run a lot of the overhead on the arbiter node.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
still run a lot of the overhead on the arbiter node.
still run most of the OCP overhead on the arbiter node.


## Infrastructure Needed [optional]

N/A