Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove nbctl daemon and fix upgrade tests #2707

Merged
merged 3 commits into from
Apr 13, 2022

Conversation

astoycos
Copy link
Collaborator

Remove the ability/config to run nbctl in daemon mode
in Ovn-Kubernetes. This is due to the fact that 90%
of nbctl calls have been converted to using libovsdb clients
in the ovnkube-master process rendering the daemon obsolete.

Would like this to merge after #2697

@coveralls
Copy link

coveralls commented Dec 10, 2021

Coverage Status

Coverage increased (+0.06%) to 51.149% when pulling ac7ea72 on astoycos:remove-nbctl-daemon into 473fcbc on ovn-org:master.

@astoycos astoycos marked this pull request as ready for review December 10, 2021 21:49
@astoycos astoycos marked this pull request as draft December 13, 2021 14:18
@astoycos astoycos marked this pull request as ready for review December 13, 2021 17:00
@astoycos
Copy link
Collaborator Author

astoycos commented Dec 13, 2021

I expect upgrade tests to fail here since the ovnkube.sh script is not upgraded in these tests, so we still try to check that the ovn-nbctl daemon is running in ovnkube.sh

 echo "=============== ovn-master (wait for ovn-nbctl daemon) ========== MASTER ONLY"
  wait_for_event process_ready ovn-nbctl

Not really sure how/if I should handle this upgrade scenario quite yet

@astoycos astoycos force-pushed the remove-nbctl-daemon branch 4 times, most recently from e2a8baa to e8acf58 Compare December 14, 2021 17:04
@astoycos astoycos changed the title Remove nbctl daemon Remove nbctl daemon and fix upgrade tests Dec 14, 2021
@astoycos
Copy link
Collaborator Author

Had to fix some things around upgrade, specifically,

  • We were using the PR branch's kind scripts with the master branch's OVN image when first setting up kind
  • During upgrade we were only swapping out the ovn image in the ovn resources (ovnkube-master deployment, ovnkube-db and ovnkube-node daemons) Now we regenerate the yamls from the PR branch and properly update them

go-controller/pkg/util/ovs.go Outdated Show resolved Hide resolved
.github/workflows/test.yml Show resolved Hide resolved
go-controller/pkg/util/ovs.go Outdated Show resolved Hide resolved
@astoycos astoycos force-pushed the remove-nbctl-daemon branch 2 times, most recently from 502cf29 to ba216a7 Compare January 6, 2022 21:12
@astoycos
Copy link
Collaborator Author

astoycos commented Jan 6, 2022

/retest

3 similar comments
@astoycos
Copy link
Collaborator Author

astoycos commented Jan 6, 2022

/retest

@astoycos
Copy link
Collaborator Author

astoycos commented Jan 7, 2022

/retest

@astoycos
Copy link
Collaborator Author

astoycos commented Jan 7, 2022

/retest

@astoycos astoycos force-pushed the remove-nbctl-daemon branch 5 times, most recently from 62c5403 to c52894c Compare March 10, 2022 15:04
@astoycos astoycos requested a review from trozet March 14, 2022 13:47
@astoycos
Copy link
Collaborator Author

@jcaamano If you get the chance you could also take a look here, pretty much the wrap up to libovsdb conversion

@astoycos
Copy link
Collaborator Author

astoycos commented Apr 7, 2022

/retest

@astoycos
Copy link
Collaborator Author

astoycos commented Apr 8, 2022

/retest failed

@astoycos
Copy link
Collaborator Author

astoycos commented Apr 8, 2022

/retest-failed

@astoycos
Copy link
Collaborator Author

astoycos commented Apr 8, 2022

/retest-failed

@astoycos
Copy link
Collaborator Author

astoycos commented Apr 8, 2022

/retest

contrib/kind-ovn.yaml Outdated Show resolved Hide resolved
go-controller/go.sum Outdated Show resolved Hide resolved
test/scripts/upgrade-ovn.sh Outdated Show resolved Hide resolved
if [ -f ${CI_IMAGE_CACHE}${CI_IMAGE_MASTER_TAR}.gz ]; then
cp ${CI_IMAGE_CACHE}/${CI_IMAGE_MASTER_TAR}.gz ${CI_IMAGE_MASTER_TAR}.gz
gunzip ${CI_IMAGE_MASTER_TAR}.gz
echo "::set-output name=LOAD_MASTER_IMAGE_TO_CACHE::false"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
It would have been much easier for me to understand if we had an MASTER_IMAGE_RESTORED_FROM_CACHE instead of LOAD_MASTER_IMAGE_TO_CACHE
MASTER_IMAGE_RESTORED_FROM_CACHE would be set to false by default and to true within this if clause.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack changed the variables to MASTER_IMAGE_RESTORED_FROM_GHCR and MASTER_IMAGE_RESTORED_FROM_CACHE


- name: Cache master image
if: steps.is_master_image_build_needed.outputs.MASTER_IMAGE_RESTORED != 'true' && success()
continue-on-error: true
if: (steps.is_master_image_build_needed.outputs.LOAD_MASTER_IMAGE_TO_CACHE == 'true' || steps.is_master_image_build_needed.outputs.MASTER_IMAGE_RESTORED != 'true') && success()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit continued:
and then here we would check for MASTER_IMAGE_RESTORED==true && MASTER_IMAGE_RESTORED_FROM_CACHE==false
It just all reads more natural this way.

- name: Check out code into the Go module directory
uses: actions/checkout@v2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙋‍♂️ I could not come up with a reason on why we need this checkout

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I don't believe we do

@astoycos astoycos force-pushed the remove-nbctl-daemon branch 3 times, most recently from b1a73b3 to e2d9865 Compare April 11, 2022 16:18
@astoycos
Copy link
Collaborator Author

/retest

2 similar comments
@astoycos
Copy link
Collaborator Author

/retest

@astoycos
Copy link
Collaborator Author

/retest

@astoycos
Copy link
Collaborator Author

/retest-failed

1 similar comment
@astoycos
Copy link
Collaborator Author

/retest-failed

@astoycos
Copy link
Collaborator Author

/retest

@jcaamano
Copy link
Contributor

/retest-failed

@astoycos
Copy link
Collaborator Author

@jcaamano CI has been going a bit bonkers and is definitely not retesting as desired....Let me try and manually rebase

Remove the ability/config to run nbctl in daemon mode
in ovn-kubernetes.  This is due to the fact that 90%
of nbctl calls have been converted to using libovsdb clients
in the ovnkube-master process rendering the daemon obsolete.

Also remove the `skipped_nbctl_daemon_total` metric.

Signed-off-by: astoycos <[email protected]>
Fix the GH actions workflow to ensure we use the MASTER kind
scripts and images when bringing up the pre-upgrade kind cluster

And checkout the PR's code to ensure we build/deploy updated manifests
as well as images.

Upgrade the script to properly upgrade the ovnkube
resources, these include the
- ovnkube-master deployment
- ovnkube-db and ovnkube-node daemonsets

rather than simply upgrading the ovn-image used
by the resources

Run go-mod tidy to clean-up some stale artifacts in
go-controller/go.sum

Signed-off-by: astoycos <[email protected]>
Now that ovn-k images are automatically built for
master and pushed to https://github.com/orgs/ovn-org/packages
We don't need to rebuild the master branch based images
everytime CI runs

This PR edits the `build-master` workflow to
1. Look in the cache for the image
2. If not there try and pull from ghcr.io
3. If pull fails build the image

Signed-off-by: astoycos <[email protected]>
@jcaamano
Copy link
Contributor

/retest-failed

@jcaamano
Copy link
Contributor

@jcaamano CI has been going a bit bonkers and is definitely not retesting as desired....Let me try and manually rebase

Yeah, it looks like the latest workflow run is not registered and when we retest-failed it triggers an old run.
Anyway, this LGTM and the e2e-dual-conversion are already known to be for a different reason #2885

@jcaamano
Copy link
Contributor

Also the junit report jobs seems to be crossposting
take for example https://github.com/ovn-org/ovn-kubernetes/runs/5996754268?check_suite_focus=true
ran for a workflow run even on this workflow run : https://api.github.com/repos/ovn-org/ovn-kubernetes/actions/runs/2145835180
The head commit there is b1c5e7e
But in the job log you can see it posted to ac7ea72

@jcaamano jcaamano merged commit 55500c1 into ovn-kubernetes:master Apr 13, 2022
@jcaamano jcaamano mentioned this pull request Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants