Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-cell adoption #517

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

bogdando
Copy link
Contributor

@bogdando bogdando commented Jul 3, 2024

Split edpm nodes into compute cells by 1:1 mapping it as
dataplane nodesets.

Use edpm_nodes var to describe compuptes for each cell,
instead of static host and ip vars that only used to work for
a single-cell standalone, or multi-node single cell cases.
Also explain EDPM net config requirements in vars.sample, when
it is used outside of ci-framework (local deployments).

Remove edpm_computes vars no longer used after moving stopping
control-plane tripleo services into edpm-ansible

Simplify ENV headers management by collecting in a single place.

Provide a variable to define the source cloud Ironic topology,
for any cells with Ironic services.

Align nova/libvirt and related services ordering in the
lists of services defined in multiple places, with those
specified in VA.

Align the names in the tests to follow the documented steps
to make the corresponding code easy discoverable.

Adjust storage/storageRequests values to make it better fitting
a multi-cell test scenarios. Also provide values in docs and
add a comment to adjust them as needed.

Stop ovn services only if active, or not missing (like on
the cell controllers)

Retain EDPM host IPs on internalapi network. Without that, edpm-ansible's os-net-config
changes IPs on internalapi, and also breaks connectivity to EDPM hosts for ansible
(which restores after a node reboot).

Add edpmRoleServiceName value for tlsCerts.

Depends-On: https://review.rdoproject.org/r/c/rdo-jobs/+/55910

Jira: #OSPRH-6548

@bogdando bogdando changed the title Multi-cell adoption [WIP] Multi-cell adoption Jul 3, 2024
@bogdando
Copy link
Contributor Author

bogdando commented Jul 8, 2024

The recent revision gives an overview to the approach taken, PTAL.
As long as we need to maintain the docs-as-code here, I'm afraid there would be no a cleaner solution than that.
For the ci-framework and rdo-jobs side of things, which should template all that in, I have WIP as well...
@jistr @SeanMooney

Copy link

This change depends on a change that failed to merge.

Change openstack-k8s-operators/install_yamls#826 is needed.

tests/vars.sample.yaml Outdated Show resolved Hide resolved
Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 517,18b084c576712d289411bfab3a4bfee4b60a3fbf

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 517,4687df731d7a30007950c91ac21ee931ebfebf8c

@bogdando
Copy link
Contributor Author

Based on feedback from @SeanMooney, we should not shift cells names as I proposed here. We want it instead like this:

  • A single-cell adoption (only default cell exists): rename default to cell1,
  • A multi-cell ( default, cell1, etc. exist) - omit importing the default as there is no compute hosts supported to be there for a multi-cell OSP, hence nothing to adopt from it.
  • Or, a multi-cell ( default, cell1, etc. exist) - omit renaming the default cell, and import as is
  • Or, a multi-cell ( default, cell1, etc. exist) - rename default cell to the highest cell number + 1:
default -> cell4
cell1 -> cell1
cell2 -> cell2
cell3 -> cell3

Implementing either of these is quite challenging given the local requirement to maintain code in tests in the same form as it is documented (meaning shell commands). This sofisticated logic will bring in even more loops and arrays handling into already overcomplicated code proposed in this PR draft.

@jistr @gibizer looking for your ideas on that

@gibizer
Copy link
Contributor

gibizer commented Jul 11, 2024

Based on feedback from @SeanMooney, we should not shift cells names as I proposed here. We want it instead like this:

  • A single-cell adoption (only default cell exists): rename default to cell1,
  • A multi-cell ( default, cell1, etc. exist) - omit importing the default as there is no compute hosts supported to be there for a multi-cell OSP, hence nothing to adopt from it.
  • Or, a multi-cell ( default, cell1, etc. exist) - omit renaming the default cell, and import as is
  • Or, a multi-cell ( default, cell1, etc. exist) - rename default cell to the highest cell number + 1:
default -> cell4
cell1 -> cell1
cell2 -> cell2
cell3 -> cell3

Implementing either of these is quite challenging given the local requirement to maintain code in tests in the same form as it is documented (meaning shell commands). This sofisticated logic will bring in even more loops and arrays handling into already overcomplicated code proposed in this PR draft.

@jistr @gibizer looking for your ideas on that

As nova-operator allows a cell to be named "default" the simplest solution would be your second proposal. Just import the cells as is. This has the benefit also that it will work even if a given customer wrongly attached computes to the default cell.
After GA nova-operator will get the ability to delete cells. So that feature can be used later to delete the "default" cell and therefore get the deployment structurally the same as a greenfield 18 deployment.

@bogdando
Copy link
Contributor Author

bogdando commented Jul 11, 2024

I tend now to implement the last choice: for a multi-cell ( default, cell1, etc. exist) - rename default cell to the highest cell number + 1. This keeps it consistent for single cell and multicell...

/update: See the combined option which allows both renaming or importing as is

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 517,9541ce7f013b9b35b2cbd681cb30259da1a85157

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 517,54d110489b8215e014580b8b77b05ce107fd1e04

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 517,9954245ae2addd169cc80deab137024b7046f30e

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Unable to update github.com/openstack-k8s-operators/install_yamls

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 517,b946bca930ed67ffe94465e36e742abd9ba55d95

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/5f19276989c847a58099005ebe196943

✔️ noop SUCCESS in 0s
✔️ adoption-standalone-to-crc-ceph SUCCESS in 2h 56m 48s
adoption-standalone-to-crc-no-ceph FAILURE in 2h 08m 20s
✔️ adoption-docs-preview SUCCESS in 1m 25s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/0c5d3970bc60481e80426cb7653406a7

✔️ noop SUCCESS in 0s
✔️ adoption-standalone-to-crc-ceph SUCCESS in 3h 06m 12s
adoption-standalone-to-crc-no-ceph FAILURE in 2h 16m 10s
✔️ adoption-docs-preview SUCCESS in 1m 19s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/950b7f6cdd374e708ed18090743b6fcc

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph FAILURE in 2h 06m 45s
adoption-standalone-to-crc-no-ceph RETRY_LIMIT in 50m 53s
✔️ adoption-docs-preview SUCCESS in 1m 20s

Copy link

openshift-ci bot commented Jan 15, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sathlan for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

This change depends on a change that failed to merge.

Change https://review.rdoproject.org/r/c/rdo-jobs/+/55910 is needed.

Copy link

This change depends on a change that failed to merge.

Change https://review.rdoproject.org/r/c/rdo-jobs/+/55910 is needed.

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 517,cb02e9a5ce2813dc368240f8f07a368df7645628

@bogdando
Copy link
Contributor Author

bogdando commented Feb 3, 2025

recheck

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 517,cb02e9a5ce2813dc368240f8f07a368df7645628

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/data-plane-adoption for 517,7496ab7a462b66f3fcff16e84f0503ae6d90f2d3

@bogdando
Copy link
Contributor Author

bogdando commented Feb 3, 2025

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/2e2a7162892845d8b11bb4346cdae120

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph FAILURE in 1h 35m 32s
adoption-standalone-to-crc-no-ceph FAILURE in 1h 39m 59s
✔️ adoption-docs-preview SUCCESS in 1m 26s

@bogdando bogdando force-pushed the multi-cell branch 2 times, most recently from 961c529 to a5d80b9 Compare February 6, 2025 12:38
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/14ea67b5a0ad4c359372c2188858d373

✔️ noop SUCCESS in 0s
✔️ adoption-standalone-to-crc-ceph SUCCESS in 3h 16m 41s
adoption-standalone-to-crc-no-ceph FAILURE in 2h 15m 41s
✔️ adoption-docs-preview SUCCESS in 1m 18s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/0fcf4cfb4fa540b0b842dcd2c9726692

✔️ noop SUCCESS in 0s
✔️ adoption-standalone-to-crc-ceph SUCCESS in 3h 22m 24s
adoption-standalone-to-crc-no-ceph FAILURE in 2h 12m 24s
✔️ adoption-docs-preview SUCCESS in 1m 23s

Split edpm nodes into compute cells by 1:1 mapping it as
dataplane nodesets.

Use edpm_nodes var to describe compuptes for each cell,
instead of static host and ip vars that only used to work for
a single-cell standalone, or multi-node single cell cases.
Also explain EDPM net config requirements in vars.sample, when
it is used outside of ci-framework (local deployments).

Remove edpm_computes vars no longer used after moving stopping
control-plane tripleo services into edpm-ansible

Simplify ENV headers management by collecting in a single place.

Provide a variable to define the source cloud Ironic topology,
for any cells with Ironic services.

Align nova/libvirt and related services ordering in the
lists of services defined in multiple places, with those
specified in VA.

Align the names in the tests to follow the documented steps
to make the corresponding code easy discoverable.

Adjust storage/storageRequests values to make it better fitting
a multi-cell test scenarios. Also provide values in docs and
add a comment to adjust them as needed.

Stop ovn services only if active, or not missing (like on
the cell controllers)

Signed-off-by: Bohdan Dobrelia <[email protected]>
Without that, edpm-ansible's os-net-config changes IPs on internalapi,
which also breaks connectivity to EDPM hosts for ansible (restores
after a node reboot though).

Signed-off-by: Bohdan Dobrelia <[email protected]>
Copy link

This change depends on a change that failed to merge.

Change https://review.rdoproject.org/r/c/rdo-jobs/+/55910 is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
check-before-merge/depends-on Don't forget to check depends-on before merging do-not-merge/hold
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants