Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPLAT-1733: Update vSphere and include host/vm group #1677

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 5, 2024
Copy link
Contributor

openshift-ci bot commented Sep 5, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@jcpowermac jcpowermac changed the title Update vSphere and include host/vm group SPLAT-1733: Update vSphere and include host/vm group Sep 5, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 5, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 5, 2024

@jcpowermac: This pull request references SPLAT-1733 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jcpowermac jcpowermac marked this pull request as ready for review September 23, 2024 17:09
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 23, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 23, 2024

@jcpowermac: This pull request references SPLAT-1733 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Update the vSphere installer IPI zonal enhancement to include:

  • current state of zonal
  • future state of zonal with vm-host group additions

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 23, 2024

@jcpowermac: This pull request references SPLAT-1733 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Changes

Update the vSphere installer IPI zonal enhancement to include:

  • current state of zonal
  • future state of zonal with vm-host group additions

Additional PRs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jcpowermac
Copy link
Contributor Author

/assign @rvanderp3 @vr4manta

@jcpowermac
Copy link
Contributor Author

/cc @gnufied @JoelSpeed

)

// VSpherePlatformFailureDomainSpec holds the region and zone failure domain and the vCenter topology of that failure domain.
// +kubebuilder:validation:XValidation:rule="has(self.zoneType) && self.zoneType == 'HostGroup' ? self.topology.affinity.vmGroup != '' && self.topology.affinity.hostGroup != '' && self.topology.affinity.vmHostRule != '' : true",message="when zoneType is HostGroup, failuredomain topology affinity vmGroup, hostGroup and vmHostRule fields must be defined"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Topology does not appear in the spec here, I think it may have been meant to be there though?

This validation apppears that you want an optional struct, with required fields in for the zoneType, is this just a discriminated union? It would then have the standard that hostGroup is allowed if and only if type == HostGroup

region:
  type: ABC
zone:
  type: HostGroup
  hostGroup:
    vmGroup: ...
    hostGroup: ...
    vmHostRule: ...

Copy link
Contributor Author

@jcpowermac jcpowermac Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I broke out what is current (as of 9/9/24) and what is specifically changing for vm-host zonal. Rather than duplicating these are snipits of what would change. Would you rather have that as a single section?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally - it makes sense to put new changes into existing structs, rather than breaking them out. This way, we can still see what changed via diff, but we can also see the full datatype if we view complete file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed does this look like the direction you are thinking? 677ce07

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or something like this: fdb3dde

Since there is a name collision with region and zone, wondering if it should be moved to the topology struct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter of the two commits is where I was thinking to go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whew thats what I ended up with

openshift/api@906bb28

The enhancement has been updated for that

ea2162a

// This configuration within vCenter creates the required association between a failure domain, virtual machines
// and ESXi hosts to create a vm-host based zone.
type VSphereFailureDomainAffinity struct {
// VMGroupName is the name of the vm-host group of type virtual machine within vCenter for this failure domain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be vmGroup at the beginning of this godoc


// Host defines host VMs to generate as part of the installation.
type Host struct {
// FailureDomain refers to the name of a FailureDomain as described in https://github.com/openshift/enhancements/blob/master/enhancements/installer/vsphere-ipi-zonal.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably want to explain this in more depth in situ rather than referring out to a doc

Copy link
Contributor Author

@jcpowermac jcpowermac Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed first time updating a enhancement. This is what is current in the infra installer's platform spec, so are you expecting this to be updated? I suppose that should really be a bug right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't realise this was the current state, we should probably look to update these where they are non-breaking, can you create a bug?

// NetworkDeviceSpec to be applied to the host
// +kubebuilder:validation:Required
NetworkDevice *NetworkDeviceSpec `json:"networkDevice"`
// Role defines the role of the node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please expand the godoc here, what do each of these values mean and what effect does it have when I set them?

Copy link
Contributor Author

@jcpowermac jcpowermac Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, this is what is current.

Comment on lines +756 to +757
// +kubebuilder:validation:Format=ipv4
// +kubebuilder:validation:Format=ipv6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// +kubebuilder:validation:Format=ipv4
// +kubebuilder:validation:Format=ipv6
// +kubebuilder:validation:XValidation:rule="isIP(self)",message="gateway must be a valid IPv4 or IPv6 address"

// +kubebuilder:example=`192.168.1.100/24`
// +kubebuilder:example=`2001:DB8:0000:0000:244:17FF:FEB6:D37D/64`
// +kubebuilder:validation:Required
IPAddrs []string `json:"ipAddrs"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll have to define a maximal length for this

Comment on lines +760 to +761
// ipAddrs is a list of one or more IPv4 and/or IPv6 addresses and CIDR to assign to
// this device, for example, 192.168.1.100/24. IP addresses provided via ipAddrs are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which device?

Comment on lines +773 to +774
// +kubebuilder:validation:Format=ipv4
// +kubebuilder:validation:Format=ipv6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// +kubebuilder:validation:Format=ipv4
// +kubebuilder:validation:Format=ipv6
// +kubebuilder:validation:XValidation:rule="isIP(self)",message="gateway must be a valid IPv4 or IPv6 address"

// +kubebuilder:validation:Format=ipv4
// +kubebuilder:validation:Format=ipv6
// +kubebuilder:example=`8.8.8.8`
Nameservers []string `json:"nameservers,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a maximal length to this list as well

}


type Topology struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this struct included?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the infra spec changes, just showing snipits of what would actually change. The current topology struct is above.

Copy link
Contributor

openshift-ci bot commented Sep 24, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rvanderp3. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

type VSpherePlatformSpec struct {
// vcenters holds the connection details for services to communicate with vCenter.
// Currently, only a single vCenter is supported.
// Currently, only a single vCenter is supported, but in tech preview 3 vCenters are supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought one of the epics in 4.18 is to remove techpreview status from multi-vcenter support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied and pasted what the current infra spec is as of 9/9 . afaik it is supposed to be ga'ed in 4.18 but maybe those updates have not been added yet.

cc: @vr4manta

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to merge the logic in an overall doc, but while in flux, I do see the complication of multi vcenter not being here as well. Currently 4.17 it is feature gated. Once its all finalized, should be GA. Goal for GA is 4.18.

// must use URN-notation instead of display names. A maximum of 10 tag IDs may be specified.
// +kubebuilder:example=`urn:vmomi:InventoryServiceTag:5736bf56-49f5-4667-b38c-b97e09dc9578:GLOBAL`
// +optional
TagIDs []string `json:"tagIDs,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we intend to use this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added in 4.16 I believe. It allows a compute machine to be tagged. I think it was specifically for backup processes that customers wanted it for.

// +optional
LoadBalancer *configv1.VSpherePlatformLoadBalancer `json:"loadBalancer,omitempty"`
// Hosts defines network configurations to be applied by the installer. Hosts is available in TechPreview.
Hosts []*Host `json:"hosts,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to support hosts as zones? What is the use case I wonder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure the answer is yes, afaik it for static ip.

@vr4manta @rvanderp3 do you want to take that one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is not reflected in Infrastructure type above. So information will be lost after cluster installation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I think I misunderstood, Hosts is for OCP hosts and static ip, that isn't ESXi hosts.

// +openshift:validation:featureGate=VSphereHostVMGroupZonal
// +kubebuilder:validation:MaxLength=80
// +optional
VMGroup string `json:"vmGroup,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be created if it doesn't exist already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the installer will create the vmGroup, one per failure domain

// +openshift:validation:featureGate=VSphereHostVMGroupZonal
// +kubebuilder:validation:MaxLength=80
// +optional
HostGroup string `json:"hostGroup,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installer will not create the hostGroup, that would be kinda hard since we would need to know each esxi host per zone.

// and the vm-host affinity rule that together creates a affinity configuration for vm-host based zonal.
// This configuration within vCenter creates the required association between a failure domain, virtual machines
// and ESXi hosts to create a vm-host based zone.
type VSphereFailureDomainAffinity struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the thing - we will have to see is, if a customer specifies computeCluster as region and then specifies two different hostGroup as zones and assuming those hostGroups don't share the datastore - will CSI driver work...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that scenario was just asked in slack the other day so yeah should be a concern.

@jcpowermac
Copy link
Contributor Author

The enhancement has been updated to include current and future states for both the infrastructure and installer's platform spec. Hopefully that will make it more clear.

@@ -180,6 +219,57 @@ type VSpherePlatformTopology struct {
// +kubebuilder:validation:Pattern=`^/.*?/vm/.*?`
// +optional
Folder string `json:"folder,omitempty"`

// template is the full inventory path of the virtual machine or template
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If hosts are being used as zones, how will that information reflected in Infrastructure type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The infrastructure spec will have the name of the vm-host group of type hosts

We will not know the ESXi hosts directly that are in each zone. I don't think we should either since stretched clusters presumably should already have this defined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then, it sounds like customer must create a vm-host group for each host? I must be misunderstanding something. The CSI driver doesn't support hostgroup as a failure-domain fwiw.

Copy link
Contributor Author

@jcpowermac jcpowermac Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, and maybe that illustrates something that I am missing.

  • I expect the customer to create a vm-host group (host type) for each zone prior to installation. This can be 1 to N number of ESXi hosts.
  • Just like dc, cluster, region and zone topology the openshift-region and openshift-zone based tags are required. I don't expect the CCM or CSI to know anything about a virtual machine or host type vm-host group - and based in testing it doesn't, the region and zone labels are still populated on a control plane or compute node.

the vm-host groups and rules are required to keep rhcos virtual machines on the correct ESXi hosts for the failure domain zone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated with 8c3f02c

@JoelSpeed
Copy link
Contributor

@jcpowermac At the moment, this PR updates the existing state of the EP to what was actually implemented, and then also updates to add an extension to the feature itself. Is there a way to separate the two changes? I feel like it would probably be beneficial to get the EP updated to the current state of the world first, and then look at the updates for the new features

@gnufied
Copy link
Member

gnufied commented Nov 8, 2024

This looks good to me from storage side. Storage features work - https://issues.redhat.com/browse/STOR-2035

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 8, 2024
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 8, 2024
Copy link
Contributor

openshift-ci bot commented Nov 8, 2024

New changes are detected. LGTM label has been removed.

Copy link
Contributor

openshift-ci bot commented Nov 8, 2024

@jcpowermac: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants