Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BGP] Fix FRRConfiguration cleanup #340

Merged

Conversation

stuggi
Copy link
Contributor

@stuggi stuggi commented Jan 24, 2025

Currently the process of processing the FRRConfigurations is:

  • get a list of all pods in the ctlplane namespace
  • filter that list to the pods which have a secondary IF configured
  • run through that list of pods with an IF attached and validate if there is a change of the node it is running on.

While running through that nested loop to check if there is a an update needed we are also checking there is an frr config for an already deleted pod.

This is ok unless there is at least one pod with a secondary IF, if not the cleanup won't happen, which is the case for the FRRConfiguration for the last deleted pod.

This issue was only seen randomly because the functional test was not waiting for the pod and its FRRConfiguration to exist before it started the delete test.

This change fixes the functional test and moves the cleanup out of the mentioned loop so it also gets checked properly when the last pod gets deleted.

Currently the process of processing the FRRConfigurations is:
* get a list of all pods in the ctlplane namespace
* filter that list to the pods which have a secondary IF configured
* run through that list of pods with an IF attached and validate
  if there is a change of the node it is running on.

While running through that nested loop to check if there is
a an update needed we are also checking there is an frr config
for an already deleted pod.

This is ok unless there is at least one pod with a secondary IF, if
not the cleanup won't happen, which is the case for the FRRConfiguration
for the last deleted pod.

This issue was only seen randomly because the functional test
was not waiting for the pod and its FRRConfiguration to exist
before it started the delete test.

This change fixes the functional test and moves the cleanup out of
the mentioned loop so it also gets checked properly when the
last pod gets deleted.

Signed-off-by: Martin Schuppert <[email protected]>
@openshift-ci openshift-ci bot requested review from abays and lewisdenny January 24, 2025 12:58
@stuggi stuggi requested review from lmiccini and removed request for lewisdenny January 24, 2025 13:32
Copy link
Contributor

openshift-ci bot commented Jan 24, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lmiccini, stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/163374cbedcc480a8a04255da14ca84b

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 39m 36s
podified-multinode-edpm-deployment-crc FAILURE in 1h 21m 09s
cifmw-crc-podified-edpm-baremetal FAILURE in 51m 38s

@lmiccini
Copy link
Contributor

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/3ee459fefc6c4e0f8deda6e0d370de3d

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 44m 08s
podified-multinode-edpm-deployment-crc FAILURE in 1h 25m 37s
cifmw-crc-podified-edpm-baremetal FAILURE in 51m 44s

@stuggi
Copy link
Contributor Author

stuggi commented Jan 27, 2025

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c6e466a3021f4734991caafeab482523

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 13m 37s
podified-multinode-edpm-deployment-crc FAILURE in 1h 23m 50s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 21m 48s

@abays
Copy link
Contributor

abays commented Jan 27, 2025

Build failed (check pipeline). Post recheck (without leading slash) to rerun all jobs. Make sure the failure cause has been resolved before you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/c6e466a3021f4734991caafeab482523

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 13m 37s ❌ podified-multinode-edpm-deployment-crc FAILURE in 1h 23m 50s ❌ cifmw-crc-podified-edpm-baremetal FAILURE in 1h 21m 48s

Same error here as [1]:

2025-01-27 12:18:11.971 26 ERROR tempest.lib.common.utils.linux.remote_client [-] (TestNetworkBasicOps:test_connectivity_between_vms_on_different_networks) Executing command on 192.168.122.248 failed. Error: Command 'set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; ping -c1 -w1 -s56 10.100.0.20' failed, exit status: 1, stderr:

Looks like we have a cross-repo Zuul Tempest problem.

[1] openstack-k8s-operators/openstack-operator#1278 (comment)

@stuggi
Copy link
Contributor Author

stuggi commented Feb 4, 2025

recheck

@openshift-merge-bot openshift-merge-bot bot merged commit ae8379c into openstack-k8s-operators:main Feb 4, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants