Skip to content

Commit

Permalink
Created vsphere multi disk enhancement.
Browse files Browse the repository at this point in the history
  • Loading branch information
vr4manta committed Nov 7, 2024
1 parent 383f9d3 commit bf459c1
Showing 1 changed file with 35 additions and 77 deletions.
112 changes: 35 additions & 77 deletions enhancements/machine-api/vsphere-data-disk.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,52 +50,41 @@ Today the machine API does not allow for vSphere machines to be able to be confi

### Workflow Description

Explain how the user will use the feature. Be detailed and explicit.
Describe all of the actors, their roles, and the APIs or interfaces
involved. Define a starting state and then list the steps that the
user would need to go through to trigger the feature described in the
enhancement. Optionally add a
[mermaid](https://github.com/mermaid-js/mermaid#readme) sequence
diagram.

Use sub-sections to explain variations, such as for error handling,
failure recovery, or alternative outcomes.

For example:

**cluster creator** is a human user responsible for deploying a
cluster.

**application administrator** is a human user responsible for
deploying an application in a cluster.

1. The cluster creator sits down at their keyboard...
2. ...
3. The cluster creator sees that their cluster is ready to receive
applications, and gives the application administrator their
credentials.

See
https://github.com/openshift/enhancements/blob/master/enhancements/workload-partitioning/management-workload-partitioning.md#high-level-end-to-end-workflow
and https://github.com/openshift/enhancements/blob/master/enhancements/agent-installer/automated-workflow-for-agent-based-installer.md for more detailed examples.
**_Installation with data disks_**

1. User create install-config.yaml with machine pools containing data disk configuration
2. User runs the `openshift-install` program to start the creation of new cluster
3. Installer generates configs for CAPI to create the control plane machines
4. Installer generates configs for cluster CPMS, control plane machine, and compute machines sets with the machine pool information applied (including new data disk configs)
5. CAPI / CAPV creates the control plane VMs in vSphere
6. MAPI creates any compute nodes that were configured to be created at install time
7. Cluster creation completes with all desired VMs / nodes operational as well as all Cluster Operators reporting Available with no errors.

**_Machine Set Creation_**

1. User creates new machine set configuration with the vsphere machine provider spec containing data disks
2. User runs `oc create -f <filename>` to create the new machine set
3. User scales up machine set
4. MAPI creates new VM in vSphere and requests new disks be dynamically added to the VM during the cloning process
5. Machine state progresses to `Provisioned`
6. The VM is started after the cloning process completes
7. The new VM provisions successfully and the node is created in OpenShift
8. The Machine state progresses to `Running`
9. The MachineSet shows desired, current, ready, and available all with the correct counts

**_Machine Creation (No MachineSet)_**

1. User creates new machine configuration with the vsphere machine provider spec containing data disks
2. User runs `oc create -f <filename>` to create the new machine
3. MAPI creates new VM in vSphere and requests new disks be dynamically added to the VM during the cloning process
4. The Machine state transitions to `Provisioned`
5. The VM is started
6. The new VM starts successfully and the node is created in OpenShift
7. The Machine state transitions to `Running`

### API Extensions

API Extensions are CRDs, admission and conversion webhooks, aggregated API servers,
and finalizers, i.e. those mechanisms that change the OCP API surface and behaviour.

- Name the API extensions this enhancement adds or modifies.
- Does this enhancement modify the behaviour of existing resources, especially those owned
by other parties than the authoring team (including upstream resources), and, if yes, how?
Please add those other parties as reviewers to the enhancement.

Examples:
- Adds a finalizer to namespaces. Namespace cannot be deleted without our controller running.
- Restricts the label format for objects to X.
- Defaults field Y on object kind Z.

Fill in the operational impact of these API Extensions in the "Operational Aspects
of API Extensions" section.
This enhancement will be enhancing the installer's CRD / type used for the install-config.yaml and will also be enhancing the vsphere machine provider spec type and all dependent CRDs.

#### Installer

Expand Down Expand Up @@ -458,47 +447,16 @@ Upgrade expectations:
Downgrade expectations:
- If upgrade succeeded and new machines were configured to use data disks, these configuration must be undone before downgraded due to CRD incompatibility.
- Deleting any CVO-managed resources added by the new version. The CVO does not currently delete resources that no longer exist in the target version.
- Rollback of install will convert the CRDs back to a supported state. There is no manual need to remove any CRDs since no new CRDs are introduced.
## Version Skew Strategy
N/A
## Operational Aspects of API Extensions
Describe the impact of API extensions (mentioned in the proposal section, i.e. CRDs,
admission and conversion webhooks, aggregated API servers, finalizers) here in detail,
especially how they impact the OCP system architecture and operational aspects.
- For conversion/admission webhooks and aggregated apiservers: what are the SLIs (Service Level
Indicators) an administrator or support can use to determine the health of the API extensions
Examples (metrics, alerts, operator conditions)
- authentication-operator condition `APIServerDegraded=False`
- authentication-operator condition `APIServerAvailable=True`
- openshift-authentication/oauth-apiserver deployment and pods health

- What impact do these API extensions have on existing SLIs (e.g. scalability, API throughput,
API availability)

Examples:
- Adds 1s to every pod update in the system, slowing down pod scheduling by 5s on average.
- Fails creation of ConfigMap in the system when the webhook is not available.
- Adds a dependency on the SDN service network for all resources, risking API availability in case
of SDN issues.
- Expected use-cases require less than 1000 instances of the CRD, not impacting
general API throughput.

- How is the impact on existing SLIs to be measured and when (e.g. every release by QE, or
automatically in CI) and by whom (e.g. perf team; name the responsible person and let them review
this enhancement)

- Describe the possible failure modes of the API extensions.
- Describe how a failure or behaviour of the extension will impact the overall cluster health
(e.g. which kube-controller-manager functionality will stop working), especially regarding
stability, availability, performance and security.
- Describe which OCP teams are likely to be called upon in case of escalation with one of the failure modes
and add them as reviewers to this enhancement.
- Addition of data disks to VMs adds marginal time to creation (clone) of each virtual machine. The amount of time is negligible compared to the cloning process as a whole.
- New vmdk files will be created for each machine that will be present in the VM's folder in vCenter. The naming of the new disk files will follow that of the primary disk. This is normally the VM's name with _# at the end where # is the index the disk is configured in.
## Support Procedures
Expand Down

0 comments on commit bf459c1

Please sign in to comment.