Skip to content

Conversation

@jaypoulz
Copy link

@jaypoulz jaypoulz commented Oct 21, 2025

Introduces tnf.etcd.openshift.io/v1alpha1 API group with PacemakerStatus custom resource. This provides visibility into Pacemaker cluster health for dual-replica etcd deployments. The status-only resource is populated by a privileged controller and consumed by the cluster-etcd-operator healthcheck controller. Not gated because it's only used by CEO when two-node has transitioned.

Works in conjunction with openshift/cluster-etcd-operator#1487

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 21, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 21, 2025

@jaypoulz: This pull request references OCPEDGE-2084 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

Introduces tnf.etcd.openshift.io/v1alpha1 API group with PacemakerStatus custom resource. This provides visibility into Pacemaker cluster health for dual-replica etcd deployments. The status-only resource is populated by a privileged controller and consumed by the cluster-etcd-operator healthcheck controller. Gated by DualReplica feature and managed by two-node-fencing component.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 21, 2025

Hello @jaypoulz! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Oct 21, 2025
@openshift-ci openshift-ci bot added the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label Oct 21, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 21, 2025

@jaypoulz: This pull request references OCPEDGE-2084 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

Introduces tnf.etcd.openshift.io/v1alpha1 API group with PacemakerStatus custom resource. This provides visibility into Pacemaker cluster health for dual-replica etcd deployments. The status-only resource is populated by a privileged controller and consumed by the cluster-etcd-operator healthcheck controller. Gated by DualReplica feature and managed by two-node-fencing component.

Works in conjunction with openshift/cluster-etcd-operator#1487

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot removed the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label Oct 21, 2025
@jaypoulz jaypoulz force-pushed the OCPEDGE-2084 branch 4 times, most recently from 2ba442d to 29b9fec Compare October 21, 2025 23:56
@saschagrunert
Copy link
Member

@jaypoulz thank you for the PR, do you mind making the CI happy?

@jaypoulz
Copy link
Author

Hi @saschagrunert :) Working on it! :D
New to this repo so working through beginner challenges 😸

@jaypoulz
Copy link
Author

A few open questions I have:

  1. This is a config object of a sort. It's created by cluster-etcd-operator only when you have a two-node cluster and only for the purposes of gathering information about the health of pacemaker (our ha tool) from the nodes. I put it in etcd/tnf (two node fencing) because it seemed sensible. But I'm not sure if it needs to be in config.

That said, it doesn't work like a normal config - there's no spec and it shouldn't be created during bootstrap. The CRD just needs to be present when the CEO runs an cronjob to post an update to it.

  1. bash hack/update-protobuf.sh failed for me because it wanted the path to be installed in my go path. That said, cursor happily runs it and copies over the files without issue. I'm just skeptical of the zz_generated files, but I assume those are verified by CI?

  2. For the non-boolean enum fields. Should I be creating static string definitions that can be exported to CEO? How do I generate those?

@jaypoulz jaypoulz force-pushed the OCPEDGE-2084 branch 2 times, most recently from b0ff230 to 1b57b09 Compare October 22, 2025 16:59
@openshift-ci openshift-ci bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 22, 2025
@jaypoulz jaypoulz force-pushed the OCPEDGE-2084 branch 4 times, most recently from b9b727f to fdd53e9 Compare October 22, 2025 20:37
@saschagrunert
Copy link
Member

saschagrunert commented Oct 23, 2025

Yeah, I'll ignore the CI failures for now, running ./hack/update-codegen.sh locally also gives me a diff in openapi/generated_openapi/zz_generated.openapi.go. 🙃

A few open questions I have:

  1. This is a config object of a sort. It's created by cluster-etcd-operator only when you have a two-node cluster and only for the purposes of gathering information about the health of pacemaker (our ha tool) from the nodes. I put it in etcd/tnf (two node fencing) because it seemed sensible. But I'm not sure if it needs to be in config.

I'm new to API review, but my gut feeling tells me that a dedicated etcd API group sounds fine for that purpose.

That said, it doesn't work like a normal config - there's no spec and it shouldn't be created during bootstrap. The CRD just needs to be present when the CEO runs an cronjob to post an update to it.

  1. bash hack/update-protobuf.sh failed for me because it wanted the path to be installed in my go path. That said, cursor happily runs it and copies over the files without issue. I'm just skeptical of the zz_generated files, but I assume those are verified by CI?

You can also try to run it in a container by make verify-with-container.

  1. For the non-boolean enum fields. Should I be creating static string definitions that can be exported to CEO? How do I generate those?

Do you mind elaborating on that? Do you mean generating the code for the unions?

API docs ref: https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md#writing-a-union-in-go


@jaypoulz is there an OpenShift enhancement available for this change?

@saschagrunert
Copy link
Member

/retest

@jaypoulz jaypoulz force-pushed the OCPEDGE-2084 branch 3 times, most recently from 3e02535 to e6b5c99 Compare October 28, 2025 17:20
@jaypoulz jaypoulz force-pushed the OCPEDGE-2084 branch 3 times, most recently from d29f516 to cf53006 Compare October 28, 2025 23:11
Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from an API Shadow review perspective.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 29, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: saschagrunert
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@saschagrunert
Copy link
Member

/retest

@JoelSpeed
Copy link
Contributor

Since @saschagrunert has said this is good from his side, I'll now take over the API review. Since it's shift week, I'm not expecting to pick this up until Monday

@jaypoulz
Copy link
Author

Sounds good to me! :)

@jaypoulz jaypoulz force-pushed the OCPEDGE-2084 branch 2 times, most recently from 8513003 to 8c8680a Compare October 29, 2025 18:33
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 3, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 4, 2025
@jaypoulz
Copy link
Author

jaypoulz commented Nov 4, 2025

/retest-required

1 similar comment
@jaypoulz
Copy link
Author

jaypoulz commented Nov 4, 2025

/retest-required

Introduces etcd.openshift.io/v1alpha1 API group with a PacemakerCluster
custom resource. This provides visibility into Pacemaker cluster health for
Two Node Fencing (TNF) etcd deployments. The status-only resource is populated by a
privileged controller and consumed by the cluster-etcd-operator healthcheck
controller. This API is not explicitly gated because it's only created by CEO
once the transition to an ExternalEtcd has occured. This means that it is
naturally gated by the TNF topology.
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 5, 2025

@jaypoulz: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 1d41200 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jaypoulz
Copy link
Author

jaypoulz commented Nov 5, 2025

/retest-required

kind: PacemakerCluster
metadata:
name: cluster
spec: {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we try to avoid allowing an empty spec to be valid. What would this object achieve if it has no spec?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only exists to reflect status. It's not configuration, not is it trying to modify behavior or configure the cluster in any way. This is one of the reasons I wasn't sure if this really belonged in API.

// +k8s:openapi-gen=true
// +openshift:featuregated-schema-gen=true

// +kubebuilder:validation:Optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't do this (I know this exists on other APIs but it's not right)

This changes the default behaviour for optionality of a field and has bitten many people where they thought they were making fields required and weren't

Comment on lines +1 to +7
.PHONY: verify-with-container
verify-with-container:
$(MAKE) -f ../../Makefile $@

.PHONY: update-with-container
update-with-container:
$(MAKE) -f ../../Makefile $@
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually work? I wouldn't expect it to since we don't generally maintain the update-with-container targets, and some of them require context of the entire API

Also, we would usually have a test target at this level of makefile, can you please include that


### Feature Gate

- **Feature Gate**: None - this CRD is gated by cluster-etcd-operator start-up. It will only be created once a TNF cluster has transitioned to external etcd.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All APIs must start behind a feature gate, even in v1alpha1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can throw it behind the DualReplica feature gate because it's already blocked by that gate. I don't think it needs its own gate since TNF and this monitor are tightly coupled. This makes more sense now that it'll be TP in 4.21.


The API follows a "Design Principle: Act on Deterministic Information" approach:
- Almost all fields are optional except `lastUpdated`
- Missing data means "unknown" not "error"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally prefer to populate all data with explicit unknown rather than have it omitted

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we should default to required for fields wherever possible?

Comment on lines +362 to +369
// mode indicates if the node is in active or standby mode
// NodeModeType can be one of the following values:
// - Active - the node is in active mode
// - Standby - the node is in standby mode
// When present, it must be a valid NodeModeType.
// When not present, the node mode is unknown. This likely indicates that there is an error parsing the raw XML output.
// +optional
Mode NodeModeType `json:"mode,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also better as a condition

// - Started - the resource is started
// - Stopped - the resource is stopped
// We don't use promoted and unpromoted, so resources in those roles would omit the role field.
// When present, it must be a valid ResourceRoleType.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a valid ResourceRoleType?

Comment on lines +408 to +414
// node is the node where the resource is running
// When present, it must be a valid string between 1 and 256 characters long.
// When not present, the resource is not assigned to a node. This typically indicates a stopped or unscheduled resource. It could also imply an error parsing the raw XML output.
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=256
// +optional
Node string `json:"node,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put the resources under the node status so it's clear which node they are running on?

Comment on lines +417 to +463
// PacemakerNodeHistoryEntry represents a single operation history entry from node_history
type PacemakerNodeHistoryEntry struct {
// node is the node where the operation occurred
// It must be a valid string between 1 and 256 characters long and cannot be empty.
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=256
// +required
Node string `json:"node,omitempty"`

// resource is the resource that was operated on
// It must be a valid string between 1 and 256 characters long and cannot be empty.
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=256
// +required
Resource string `json:"resource,omitempty"`

// operation is the operation that was performed (e.g., "monitor", "start", "stop")
// Unlike other fields, this is not an enum because while "monitor", "start" and "stop"
// are the most common, resource agents can define their own operations.
// It must be a valid string between 1 and 32 characters long and cannot be empty.
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=32
// +required
Operation string `json:"operation,omitempty"`

// rc is the return code from the operation
// When present, it must be a valid integer between 0 and 2147483647 (max 32-bit int) inclusive.
// When not present, the return code is unknown. This likely indicates that there is an error parsing the raw XML output.
// +kubebuilder:validation:Minimum=0
// +kubebuilder:validation:Maximum=2147483647
// +optional
RC *int32 `json:"rc,omitempty"`

// rcText is the human-readable return code text (e.g., "ok", "error", "not running")
// When present, it must be a valid string between 1 and 32 characters long. This is a human-readable string and is not validated against any specific format.
// When not present, the return code text is unknown. This likely indicates that there is an error parsing the raw XML output.
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=32
// +optional
RCText string `json:"rcText,omitempty"`

// lastRCChange is the timestamp when the RC last changed
// It must be a valid timestamp in RFC3339 format and cannot be empty.
// +kubebuilder:validation:Format=RFC3339
// +required
LastRCChange metav1.Time `json:"lastRCChange,omitempty"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it would be better represented as an emitted corev1.Event

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That what we're using this for :)
One thing that might not be clear - why do we even need this? Can't CEO collect the status updates, produce the events, update its conditions, etc. without introducing a CRD for this?

It could for sure. But there are 2 reasons for this:

  1. We need some of this status to persist (e.g. node IPs for node replacement events)
  2. The source of pacemakercluster status updates could be external to the cluster entirely. (In a future update, we'd like to do pacemaker alert-agent based reporting). We could set up some kind of service account on the nodes to create multiple internal types - events, pacemakercluster, etc. but it felt cleaner to have pacemaker just give CEO it's relevant updates as a single status and have the operator decide if any events were noteworthy enough to have events.

Comment on lines +465 to +499
// PacemakerFencingEvent represents a single fencing event from fence history
type PacemakerFencingEvent struct {
// target is the node that was fenced
// It must be a valid string between 1 and 256 characters long and cannot be empty.
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=256
// +required
Target string `json:"target,omitempty"`

// action is the fencing action performed
// FencingActionType can be one of the following values:
// - reboot - the node was rebooted
// - off - the node was turned off
// - on - the node was turned on
// When present, it must be a valid FencingActionType.
// When not present, the fencing action is unknown. This likely indicates that there is an error parsing the raw XML output.
// +optional
Action FencingActionType `json:"action,omitempty"`

// status is the status of the fencing operation
// FencingStatusType can be one of the following values:
// - success - the fencing event was successful
// - failed - the fencing event failed
// - pending - the fencing event is pending
// When present, it must be a valid FencingStatusType.
// When not present, the fencing status is unknown. This likely indicates that there is an error parsing the raw XML output.
// +optional
Status FencingStatusType `json:"status,omitempty"`

// completed is the timestamp when the fencing event was completed
// It must be a valid timestamp in RFC3339 format and cannot be empty.
// +kubebuilder:validation:Format=RFC3339
// +required
Completed metav1.Time `json:"completed,omitempty"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, why not use corev1.Event to represent these events in time?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants