Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipam: Fix init flow in case there are sticky ips in the system #4823

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

oshoval
Copy link
Contributor

@oshoval oshoval commented Nov 6, 2024

📑 Description

In case the ovnkube-control-plane pod is restarted,
(for example, node restart, upgrade, or any pod restart reason),
the ipam claim Sync function creates an array of all the ipams
belonging to a specific network, and used named allocator AllocateIPs
method to allocate them, so the allocator will reflect the current
cluster state.

The problem is AllocateIPs allows to allocate only one IP per given subnet,
so once it is used for all the IPs in the network at once, it will fail,
causing the control plane to have a crash loop.

Fix it by calling AllocateIPs per each claim on its own.
The claims were already created correctly, so each claim on its
own is safe to call AllocateIPs.

Add unit test to show the relevant assertion.

Fixes #

Additional Information for reviewers

Seen by deleting the pod while there were 2 ipam claims with IPs, from the same NAD
in the system.

✅ Checks

  • My code requires changes to the documentation
  • if so, I have updated the documentation as required
  • My code requires tests
  • if so, I have added and/or updated the tests as required
  • All the tests have passed in the CI

How to verify it

@oshoval oshoval changed the title allocator: Fix sync WIP allocator: Fix sync Nov 6, 2024
@oshoval oshoval force-pushed the fix_sync branch 2 times, most recently from 95c2b59 to 4fd51dd Compare November 6, 2024 11:17
@oshoval oshoval changed the title WIP allocator: Fix sync allocator: Fix sync in case there are multiple sticky IPs from same subnet Nov 6, 2024
@oshoval oshoval changed the title allocator: Fix sync in case there are multiple sticky IPs from same subnet WIP allocator: Fix sync in case there are multiple sticky IPs from same subnet Nov 6, 2024
@oshoval oshoval marked this pull request as ready for review November 6, 2024 14:26
@oshoval oshoval force-pushed the fix_sync branch 2 times, most recently from 202efa6 to 455fed4 Compare November 7, 2024 07:27
@oshoval
Copy link
Contributor Author

oshoval commented Nov 7, 2024

Added unit test that shows the current AllocateIPs behavior when trying to allocate several IPs from the same subnet
This behavior is the problematic one for the current Sync logic

@oshoval oshoval changed the title WIP allocator: Fix sync in case there are multiple sticky IPs from same subnet ipam: Fix init flow in case there are sticky ips in the system Nov 7, 2024
maiqueb
maiqueb previously approved these changes Nov 7, 2024
Copy link
Contributor

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

go-controller/pkg/allocator/ip/subnet/allocator_test.go Outdated Show resolved Hide resolved
go-controller/pkg/allocator/ip/subnet/allocator_test.go Outdated Show resolved Hide resolved
@maiqueb
Copy link
Contributor

maiqueb commented Nov 7, 2024

Maybe we could go a step further and add more unit test that asserts that you can indeed invoke sync for multiple claims:

@oshoval

@oshoval
Copy link
Contributor Author

oshoval commented Nov 7, 2024

Addressed comments
Thanks

Copy link
Contributor

@qinqon qinqon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a Sync test, just allocators.

go-controller/pkg/persistentips/allocator.go Outdated Show resolved Hide resolved
go-controller/pkg/allocator/ip/subnet/allocator_test.go Outdated Show resolved Hide resolved
go-controller/pkg/allocator/ip/subnet/allocator_test.go Outdated Show resolved Hide resolved
@oshoval
Copy link
Contributor Author

oshoval commented Nov 7, 2024

I don't see a Sync test, just allocators.

There are two unit tests that have it, those
#4823 (comment)

@oshoval
Copy link
Contributor Author

oshoval commented Nov 7, 2024

addressed comments
thanks

@oshoval oshoval requested a review from qinqon November 7, 2024 13:06
Copy link
Contributor

@qinqon qinqon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

In case the ovn control plane pod is restarted,
(for example, node restart, upgrade, or any pod restart reason),
the ipam claim Sync function created an array of all the ipams
belonging to a specific network, and used named allocator AllocateIPs
method to allocate them, so the allocator will reflect the current
cluster state.

The problem is AllocateIPs allows to allocate only one IP per given subnet,
so once it is used for all the IPs in the network at once, it will fail,
causing the control plane to have a crash loop.

Fix it by calling AllocateIPs per each claim on its own.
The claims were already created correctly, so each claim on its
own is safe to call AllocateIPs.

Add unit test to show the relevant assertion.

Signed-off-by: Or Shoval <[email protected]>
@oshoval
Copy link
Contributor Author

oshoval commented Nov 13, 2024

rebased

@github-actions github-actions bot added the area/unit-testing Issues related to adding/updating unit tests label Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/unit-testing Issues related to adding/updating unit tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants