Skip to content

Commit

Permalink
api: types: introduce NodeGroupStatus
Browse files Browse the repository at this point in the history
So far, tracking the node groups' statuses has been done via the collective operator status, which contains a list of all affected MCPs and their matching RTE daemonsets list.

This part aims to enable the status per node group to be populated in a single node group status wrapper instead of an accumulative operator status. For that we need to link the MCPs to a node group for that we use the mcp name.The new NodeGroupStatus consists of Daemonset, NodeGroupConfig & PoolName. It is known that there is a daemonset per MCP not per NodeGroup (see https://github.com/openshift-kni/numaresources-operator/pull/1020/files). So we allow tracking each single matching MCP group config status in a single NodeGroupStatus, without breaking backward compatibility and while making a base for future plans.
We keep populating the statuses in NUMAResourcesOperatorStatus fields to retain API backward compatibility, and additionally, we start reflecting the status per node pool (MCP or NodePool later in HCP). The relation between the current node group MCP and daemon sets and the new representation is 1:1, and there is no change in the functionality.

Signed-off-by: Shereen Haj <[email protected]>
  • Loading branch information
shajmakh committed Oct 2, 2024
1 parent 80c9174 commit 9a7529f
Show file tree
Hide file tree
Showing 13 changed files with 432 additions and 27 deletions.
26 changes: 26 additions & 0 deletions api/numaresourcesoperator/v1/numaresourcesoperator_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,28 @@ type NodeGroup struct {
PoolName *string `json:"poolName,omitempty"`
}

// NodeGroupStatus reports the status of a NodeGroup once matches an actual set of nodes and it is correctly processed
// by the system. In other words, is not possible to have a NodeGroupStatus which does not represent a valid NodeGroup
// which in turn correctly references unambiguously a set of nodes in the cluster.
// Hence, if a NodeGroupStatus is published, its `Name` must be present, because it refers back to a NodeGroup whose
// config was correctly processed in the Spec. And its DaemonSet will be nonempty, because matches correctly a set
// of nodes in the cluster. The Config is best-effort always represented, possibly reflecting the system defaults.
// If the system cannot process a NodeGroup correctly from the Spec, it will report Degraded state in the top-level
// condition, and will provide details using the aforementioned conditions.
type NodeGroupStatus struct {
// DaemonSet of the configured RTEs, for this node group
//+operator-sdk:csv:customresourcedefinitions:type=status,displayName="RTE DaemonSets"
DaemonSet *NamespacedName `json:"daemonsets,omitempty"`
// NodeGroupConfig represents the latest available configuration applied to this NodeGroup
// +optional
//+operator-sdk:csv:customresourcedefinitions:type=status,displayName="Optional configuration enforced on this NodeGroup"
Config *NodeGroupConfig `json:"config,omitempty"`
// PoolName represents the pool name to which the nodes belong that the config of this node group is be applied to
// +optional
//+operator-sdk:csv:customresourcedefinitions:type=status,displayName="Pool name of nodes in this node group"
PoolName string `json:"selector,omitempty"`
}

// NUMAResourcesOperatorStatus defines the observed state of NUMAResourcesOperator
type NUMAResourcesOperatorStatus struct {
// DaemonSets of the configured RTEs, one per node group
Expand All @@ -133,6 +155,10 @@ type NUMAResourcesOperatorStatus struct {
// MachineConfigPools resolved from configured node groups
//+operator-sdk:csv:customresourcedefinitions:type=status,displayName="RTE MCPs from node groups"
MachineConfigPools []MachineConfigPool `json:"machineconfigpools,omitempty"`
// NodeGroups report the observed status of the configured NodeGroups, matching by their name
// +optional
//+operator-sdk:csv:customresourcedefinitions:type=status,displayName="Node groups observed status"
NodeGroups []NodeGroupStatus `json:"nodeGroups,omitempty"`
// Conditions show the current state of the NUMAResourcesOperator Operator
//+operator-sdk:csv:customresourcedefinitions:type=status,displayName="Condition reported"
Conditions []metav1.Condition `json:"conditions,omitempty"`
Expand Down
32 changes: 32 additions & 0 deletions api/numaresourcesoperator/v1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

104 changes: 104 additions & 0 deletions bundle/manifests/nodetopology.openshift.io_numaresourcesoperators.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -406,6 +406,110 @@ spec:
- name
type: object
type: array
nodeGroups:
description: NodeGroups report the observed status of the configured
NodeGroups, matching by their name
items:
description: |-
NodeGroupStatus reports the status of a NodeGroup once matches an actual set of nodes and it is correctly processed
by the system. In other words, is not possible to have a NodeGroupStatus which does not represent a valid NodeGroup
which in turn correctly references unambiguously a set of nodes in the cluster.
Hence, if a NodeGroupStatus is published, its `Name` must be present, because it refers back to a NodeGroup whose
config was correctly processed in the Spec. And its DaemonSet will be nonempty, because matches correctly a set
of nodes in the cluster. The Config is best-effort always represented, possibly reflecting the system defaults.
If the system cannot process a NodeGroup correctly from the Spec, it will report Degraded state in the top-level
condition, and will provide details using the aforementioned conditions.
properties:
config:
description: NodeGroupConfig represents the latest available
configuration applied to this NodeGroup
properties:
infoRefreshMode:
description: InfoRefreshMode sets the mechanism which will
be used to refresh the topology info.
enum:
- Periodic
- Events
- PeriodicAndEvents
type: string
infoRefreshPause:
description: InfoRefreshPause defines if updates to NRTs
are paused for the machines belonging to this group
enum:
- Disabled
- Enabled
type: string
infoRefreshPeriod:
description: InfoRefreshPeriod sets the topology info refresh
period. Use explicit 0 to disable.
type: string
podsFingerprinting:
description: PodsFingerprinting defines if pod fingerprint
should be reported for the machines belonging to this
group
enum:
- Disabled
- Enabled
- EnabledExclusiveResources
type: string
tolerations:
description: |-
Tolerations overrides tolerations to be set into RTE daemonsets for this NodeGroup. If not empty, the tolerations will be the one set here.
Leave empty to make the system use the default tolerations.
items:
description: |-
The pod this Toleration is attached to tolerates any taint that matches
the triple <key,value,effect> using the matching operator <operator>.
properties:
effect:
description: |-
Effect indicates the taint effect to match. Empty means match all taint effects.
When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute.
type: string
key:
description: |-
Key is the taint key that the toleration applies to. Empty means match all taint keys.
If the key is empty, operator must be Exists; this combination means to match all values and all keys.
type: string
operator:
description: |-
Operator represents a key's relationship to the value.
Valid operators are Exists and Equal. Defaults to Equal.
Exists is equivalent to wildcard for value, so that a pod can
tolerate all taints of a particular category.
type: string
tolerationSeconds:
description: |-
TolerationSeconds represents the period of time the toleration (which must be
of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default,
it is not set, which means tolerate the taint forever (do not evict). Zero and
negative values will be treated as 0 (evict immediately) by the system.
format: int64
type: integer
value:
description: |-
Value is the taint value the toleration matches to.
If the operator is Exists, the value should be empty, otherwise just a regular string.
type: string
type: object
type: array
type: object
daemonsets:
description: DaemonSet of the configured RTEs, for this node
group
properties:
name:
type: string
namespace:
type: string
type: object
selector:
description: PoolName represents the pool name to which the
nodes belong that the config of this node group is be applied
to
type: string
type: object
type: array
relatedObjects:
description: RelatedObjects list of objects of interest for this operator
items:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ metadata:
}
]
capabilities: Basic Install
createdAt: "2024-09-27T09:17:54Z"
createdAt: "2024-09-27T10:25:54Z"
olm.skipRange: '>=4.17.0 <4.18.0'
operators.operatorframework.io/builder: operator-sdk-v1.36.1
operators.operatorframework.io/project_layout: go.kubebuilder.io/v3
Expand Down Expand Up @@ -149,6 +149,21 @@ spec:
applied to this MachineConfigPool
displayName: Optional configuration enforced on this NodeGroup
path: machineconfigpools[0].config
- description: NodeGroups report the observed status of the configured NodeGroups,
matching by their name
displayName: Node groups observed status
path: nodeGroups
- description: NodeGroupConfig represents the latest available configuration
applied to this NodeGroup
displayName: Optional configuration enforced on this NodeGroup
path: nodeGroups[0].config
- description: DaemonSet of the configured RTEs, for this node group
displayName: RTE DaemonSets
path: nodeGroups[0].daemonsets
- description: PoolName represents the pool name to which the nodes belong that
the config of this node group is be applied to
displayName: Pool name of nodes in this node group
path: nodeGroups[0].selector
- description: RelatedObjects list of objects of interest for this operator
displayName: Related Objects
path: relatedObjects
Expand Down
104 changes: 104 additions & 0 deletions config/crd/bases/nodetopology.openshift.io_numaresourcesoperators.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -406,6 +406,110 @@ spec:
- name
type: object
type: array
nodeGroups:
description: NodeGroups report the observed status of the configured
NodeGroups, matching by their name
items:
description: |-
NodeGroupStatus reports the status of a NodeGroup once matches an actual set of nodes and it is correctly processed
by the system. In other words, is not possible to have a NodeGroupStatus which does not represent a valid NodeGroup
which in turn correctly references unambiguously a set of nodes in the cluster.
Hence, if a NodeGroupStatus is published, its `Name` must be present, because it refers back to a NodeGroup whose
config was correctly processed in the Spec. And its DaemonSet will be nonempty, because matches correctly a set
of nodes in the cluster. The Config is best-effort always represented, possibly reflecting the system defaults.
If the system cannot process a NodeGroup correctly from the Spec, it will report Degraded state in the top-level
condition, and will provide details using the aforementioned conditions.
properties:
config:
description: NodeGroupConfig represents the latest available
configuration applied to this NodeGroup
properties:
infoRefreshMode:
description: InfoRefreshMode sets the mechanism which will
be used to refresh the topology info.
enum:
- Periodic
- Events
- PeriodicAndEvents
type: string
infoRefreshPause:
description: InfoRefreshPause defines if updates to NRTs
are paused for the machines belonging to this group
enum:
- Disabled
- Enabled
type: string
infoRefreshPeriod:
description: InfoRefreshPeriod sets the topology info refresh
period. Use explicit 0 to disable.
type: string
podsFingerprinting:
description: PodsFingerprinting defines if pod fingerprint
should be reported for the machines belonging to this
group
enum:
- Disabled
- Enabled
- EnabledExclusiveResources
type: string
tolerations:
description: |-
Tolerations overrides tolerations to be set into RTE daemonsets for this NodeGroup. If not empty, the tolerations will be the one set here.
Leave empty to make the system use the default tolerations.
items:
description: |-
The pod this Toleration is attached to tolerates any taint that matches
the triple <key,value,effect> using the matching operator <operator>.
properties:
effect:
description: |-
Effect indicates the taint effect to match. Empty means match all taint effects.
When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute.
type: string
key:
description: |-
Key is the taint key that the toleration applies to. Empty means match all taint keys.
If the key is empty, operator must be Exists; this combination means to match all values and all keys.
type: string
operator:
description: |-
Operator represents a key's relationship to the value.
Valid operators are Exists and Equal. Defaults to Equal.
Exists is equivalent to wildcard for value, so that a pod can
tolerate all taints of a particular category.
type: string
tolerationSeconds:
description: |-
TolerationSeconds represents the period of time the toleration (which must be
of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default,
it is not set, which means tolerate the taint forever (do not evict). Zero and
negative values will be treated as 0 (evict immediately) by the system.
format: int64
type: integer
value:
description: |-
Value is the taint value the toleration matches to.
If the operator is Exists, the value should be empty, otherwise just a regular string.
type: string
type: object
type: array
type: object
daemonsets:
description: DaemonSet of the configured RTEs, for this node
group
properties:
name:
type: string
namespace:
type: string
type: object
selector:
description: PoolName represents the pool name to which the
nodes belong that the config of this node group is be applied
to
type: string
type: object
type: array
relatedObjects:
description: RelatedObjects list of objects of interest for this operator
items:
Expand Down
Loading

0 comments on commit 9a7529f

Please sign in to comment.