-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipam: Fix init flow in case there are sticky ips in the system #4823
base: master
Are you sure you want to change the base?
Conversation
95c2b59
to
4fd51dd
Compare
202efa6
to
455fed4
Compare
Added unit test that shows the current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Maybe we could go a step further and add more unit test that asserts that you can indeed invoke sync for multiple claims:
|
Addressed comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a Sync test, just allocators.
There are two unit tests that have it, those |
addressed comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
In case the ovn control plane pod is restarted, (for example, node restart, upgrade, or any pod restart reason), the ipam claim Sync function created an array of all the ipams belonging to a specific network, and used named allocator AllocateIPs method to allocate them, so the allocator will reflect the current cluster state. The problem is AllocateIPs allows to allocate only one IP per given subnet, so once it is used for all the IPs in the network at once, it will fail, causing the control plane to have a crash loop. Fix it by calling AllocateIPs per each claim on its own. The claims were already created correctly, so each claim on its own is safe to call AllocateIPs. Add unit test to show the relevant assertion. Signed-off-by: Or Shoval <[email protected]>
rebased |
📑 Description
In case the
ovnkube-control-plane
pod is restarted,(for example, node restart, upgrade, or any pod restart reason),
the ipam claim Sync function creates an array of all the ipams
belonging to a specific network, and used named allocator
AllocateIPs
method to allocate them, so the allocator will reflect the current
cluster state.
The problem is
AllocateIPs
allows to allocate only one IP per given subnet,so once it is used for all the IPs in the network at once, it will fail,
causing the control plane to have a crash loop.
Fix it by calling
AllocateIPs
per each claim on its own.The claims were already created correctly, so each claim on its
own is safe to call
AllocateIPs
.Add unit test to show the relevant assertion.
Fixes #
Additional Information for reviewers
Seen by deleting the pod while there were 2 ipam claims with IPs, from the same NAD
in the system.
✅ Checks
How to verify it