Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Allow clusters without explicit availability zones #1253

Closed
wants to merge 4 commits into from

Conversation

mkjpryor
Copy link
Contributor

@mkjpryor mkjpryor commented Jun 1, 2022

What this PR does / why we need it:

This PR adds the ability to create clusters without explicitly setting availability zones. The use case is discussed in detail in #1252.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1252

Special notes for your reviewer:

Adds an additional, backwards-compatible flag to the OpenStack cluster spec.

TODOs:

  • squashed commits
  • if necessary:
    • includes documentation
    • adds unit tests

/hold

@netlify
Copy link

netlify bot commented Jun 1, 2022

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit b2e18ea
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/629f0cf44f715b000aa83001
😎 Deploy Preview https://deploy-preview-1253--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 1, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @mkjpryor. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mkjpryor
To complete the pull request process, please assign seanschneeweiss after the PR has been reviewed.
You can assign the PR to them by writing /assign @seanschneeweiss in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from apricote and mdbooth June 1, 2022 15:29
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 1, 2022
@mkjpryor
Copy link
Contributor Author

mkjpryor commented Jun 1, 2022

@mdbooth

Turns out it was basically as easy as I thought. This works like a dream for me. Can I get an /ok-to-test please?

@apricote
Copy link
Member

apricote commented Jun 1, 2022

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 1, 2022
@mkjpryor
Copy link
Contributor Author

mkjpryor commented Jun 6, 2022

/retest

@mkjpryor
Copy link
Contributor Author

mkjpryor commented Jun 7, 2022

@jichenjc

I added some docs for the new option - can you review and suggest changes if required?

Copy link
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be an idea to merge the workers part of this fix separately. It's self-contained and very simple.

For the control plane, I worry that options like IgnoreFoo are at risk of polluting the API. I don't yet have a better suggestion, but I would like to fully consider options before adding to the API. I'm very much in favour of the effect of the change, btw, and not necessarily against the proposed API, but I'd like to think it all the way through first.

I have 2 threads of thoughts:

  1. We're working round behaviour which is defined by CAPI. We should discuss this with CAPI before making an API change in case they have any better ideas/imminent plans.

  2. We should write down the various ways Failure Domains might be implemented in an OpenStack cloud which are not AZ. What would an API look like which explicitly represented a failure domain in each of these models? Would it be compatible with CAPI? If not, what changes could we make to CAPI to represent more failure domain models?

On that second point, I have in mind something like:

  failureDomainModel: (AvailabilityZone|ServerGroup|None)

instead of IgnoreFailureDomain. This is barely a half-baked thought so read nothing into the detail of it, but the critical difference is that it defines what it is rather than what it is not.

@mkjpryor
Copy link
Contributor Author

@mdbooth

It might be an idea to merge the workers part of this fix separately. It's self-contained and very simple.

Happy to do this.

  1. We're working round behaviour which is defined by CAPI. We should discuss this with CAPI before making an API change in case they have any better ideas/imminent plans.

I'm not actually sure that we are. The InfraCluster.status.failureDomains field is explicitly optional in the spec (see https://cluster-api.sigs.k8s.io/developer/providers/cluster-infrastructure.html#infracluster-resources) and all this flag does is explicitly say that we don't care about AZs.

However I don't disagree with your comment that there might be a better approach.

On that second point, I have in mind something like:

  failureDomainModel: (AvailabilityZone|ServerGroup|None)

instead of IgnoreFailureDomain. This is barely a half-baked thought so read nothing into the detail of it, but the critical difference is that it defines what it is rather than what it is not.

This could actually work quite well - the only other thing I can think of is host aggregates.

I guess for my specific case I would use failureDomain: ServerGroup which would put the control plane nodes in a server group with either soft-anti-affinity or anti-affinity policies (could be configurable). The way this could work in code is:

  1. OpenStackCluster reconciliation in CAPO creates a server group
  2. The ID of the server group is reported using OpenStackCluster.status.failureDomains with the flag that identifies it as suitable for control plane nodes
  3. This will cause CAPI to create control plane nodes with the server group ID as the failureDomain
  4. CAPO knows to use the failureDomain as the server group when creating the server

What do you think?

@mkjpryor
Copy link
Contributor Author

And I guess failureDomainModel: None would be basically what I have implemented when ignoreAvailabilityZones: true.

@mkjpryor
Copy link
Contributor Author

@mdbooth

What if I change this PR to have failureDomainModel: AvailabilityZone | None instead of the flag, leaving us open for additional modes in the future?

Then submit another PR for #1256 that implements failureDomainModel: ServerGroup.

How does that sound as a plan?

@k8s-ci-robot
Copy link
Contributor

@mkjpryor: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 15, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 13, 2022
@jichenjc
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 14, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 13, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 12, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow clusters without explicit availability zones
7 participants