Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add InstanceHA Operator #237

Merged
merged 1 commit into from
Sep 4, 2024
Merged

Conversation

lmiccini
Copy link
Contributor

@lmiccini lmiccini commented Jun 27, 2024

This commit adds an operator responsible for managing a deployment/pod in charge of evacuating faulty compute nodes.

This operator reacts to changes of:

  • configmap containing the InstanceHA configuration file
  • configmap containing the OpenStack clouds.yaml
  • secret containing the OpenStack admin user password
  • secret containing the certificate authority bundle certificate

This operator allows running multiple copies (enforcing replica=1) of the InstanceHA service, each with its own configuration file and spec variables, potentially allowing it to be deployed in a multi-cloud/multi-region environment.

The InstanceHA service can be deployed by:

  1. creating a secret (for example fencing-secret-0) containing something like the following (replacing the value of the uuid key):
---
apiVersion: v1
kind: Secret
metadata:
  name: fencing-secret-0
stringData:
  fencing.yaml: |
    FencingConfig:
      compute-0:
        agent: redfish
        ipaddr: 192.168.111.9
        ipport: 8000
        login: admin
        passwd: password
        uuid: REPLACEME-0
      compute-1:
        agent: ipmi
        ipaddr: 192.168.111.10
        ipport: 8001
        login: admin
        passwd: password
  1. applying a yaml like the provided example under config/samples:
apiVersion: instanceha.openstack.org/v1beta1
kind: InstanceHA
metadata:
  name: instanceha-0
spec:
  caBundleSecretName: combined-ca-bundle
  fencingSecret: fencing-secret-0
  #networkAttachments: ['internalapi']
  #openStackCloud: "default"
  #openStackConfigMap: "openstack-config"
  #openStackConfigSecret: "openstack-config-secret"
  #instanceHAConfigMap: "instanceha-config-0"
  #instanceHAKdumpPort: "7410"

spec parameters commented out are optional.

The operator will create:

  1. configmap "instanceha-0-sh" containing a copy of the python script (templates/instanceha/bin/instanceha.py)
  2. configmap "instanceha-0-config" containing a copy of the configuration file (templates/instanceha/config/config.yaml)
  3. deployment "instanceha-0"
  4. replicaset "instanceha-0-XXX"
  5. pod "instanceha-0-XXX-YYY"

Copy link
Contributor

@stuggi stuggi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secret containing the OpenStack admin user password which one is the secret which has the admin password? iiuc currently only secrets using openstack client configs are used?

controllers/instanceha/instanceha_controller.go Outdated Show resolved Hide resolved
apis/instanceha/v1beta1/instanceha_types.go Outdated Show resolved Hide resolved
apis/instanceha/v1beta1/instanceha_types.go Outdated Show resolved Hide resolved
apis/instanceha/v1beta1/instanceha_types.go Outdated Show resolved Hide resolved
apis/instanceha/v1beta1/instanceha_types.go Outdated Show resolved Hide resolved
controllers/instanceha/instanceha_controller.go Outdated Show resolved Hide resolved
controllers/instanceha/instanceha_controller.go Outdated Show resolved Hide resolved
controllers/instanceha/instanceha_controller.go Outdated Show resolved Hide resolved
controllers/instanceha/instanceha_controller.go Outdated Show resolved Hide resolved
controllers/instanceha/instanceha_controller.go Outdated Show resolved Hide resolved
@lmiccini
Copy link
Contributor Author

lmiccini commented Jul 2, 2024

secret containing the OpenStack admin user password which one is the secret which has the admin password? iiuc currently only secrets using openstack client configs are used?

By default it uses the same secret used by the openstackclient ( // +kubebuilder:default=openstack-config-secret) . Same for kubebuilder:default=openstack-config.

@lmiccini lmiccini marked this pull request as ready for review July 3, 2024 11:47
@openshift-ci openshift-ci bot requested a review from olliewalsh July 3, 2024 11:47
@lmiccini
Copy link
Contributor Author

lmiccini commented Jul 3, 2024

/retest

@lmiccini
Copy link
Contributor Author

lmiccini commented Jul 3, 2024

/test infra-operator-build-deploy-kuttl

Copy link

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/infra-operator for 237,8d59fdc19965a953c22143a970a7d9e2b386bbf2

Copy link
Contributor

@stuggi stuggi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm lets get it in and do further enhancements in follow ups

Copy link
Contributor

openshift-ci bot commented Sep 3, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lmiccini, stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

openshift-ci bot commented Sep 3, 2024

New changes are detected. LGTM label has been removed.

This commit adds an operator responsible for managing a deployment/pod in charge of evacuating faulty compute nodes.

This operator reacts to changes of:

    configmap containing the InstanceHA configuration file
    configmap containing the OpenStack clouds.yaml
    secret containing the OpenStack admin user password
    secret containing the certificate authority bundle certificate

This operator allows running multiple copies (enforcing replica=1) of the InstanceHA service, each with its own configuration file and spec variables, potentially allowing it to be deployed in a multi-cloud/multi-region environment.

The InstanceHA service can be deployed by:

    creating a secret (for example fencing-secret-0) containing something like the following (replacing the value of the uuid key):

---
apiVersion: v1
kind: Secret
metadata:
  name: fencing-secret-0
stringData:
  fencing.yaml: |
    FencingConfig:
      compute-0:
        agent: redfish
        ipaddr: 192.168.111.9
        ipport: 8000
        login: admin
        passwd: password
        uuid: REPLACEME-0
      compute-1:
        agent: ipmi
        ipaddr: 192.168.111.10
        ipport: 8001
        login: admin
        passwd: password

    applying a yaml like the provided example under config/samples:

apiVersion: instanceha.openstack.org/v1beta1
kind: InstanceHA
metadata:
  name: instanceha-0
spec:
  caBundleSecretName: combined-ca-bundle
  fencingSecret: fencing-secret-0
  #networkAttachments: ['internalapi']
  #openStackCloud: "default"
  #openStackConfigMap: "openstack-config"
  #openStackConfigSecret: "openstack-config-secret"
  #instanceHAConfigMap: "instanceha-config-0"
  #instanceHAKdumpPort: "7410"

spec parameters commented out are optional.

The operator will create:

    configmap "instanceha-0-sh" containing a copy of the python script (templates/instanceha/bin/instanceha.py)
    configmap "instanceha-0-config" containing a copy of the configuration file (templates/instanceha/config/config.yaml)
    deployment "instanceha-0"
    replicaset "instanceha-0-XXX"
    pod "instanceha-0-XXX-YYY"

Related: https://issues.redhat.com/browse/OSPRH-3351
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/8c19325e6fb3465c885da7980707dac1

✔️ openstack-k8s-operators-content-provider SUCCESS in 10h 34m 46s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 14m 47s
cifmw-crc-podified-edpm-baremetal RETRY_LIMIT in 13m 42s

@lmiccini
Copy link
Contributor Author

lmiccini commented Sep 4, 2024

recheck

@stuggi stuggi added the lgtm label Sep 4, 2024
@lmiccini lmiccini merged commit 80a34dd into openstack-k8s-operators:main Sep 4, 2024
7 checks passed
Copy link
Contributor

openshift-ci bot commented Sep 4, 2024

@lmiccini: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/infra-operator-build-deploy-kuttl 677ad27 link unknown /test infra-operator-build-deploy-kuttl

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants