ipam: Fix init flow in case there are sticky ips in the system #4823

oshoval · 2024-11-06T10:12:42Z

📑 Description

In case the ovnkube-control-plane pod is restarted,
(for example, node restart, upgrade, or any pod restart reason),
the ipam claim Sync function creates an array of all the ipams
belonging to a specific network, and used named allocator AllocateIPs
method to allocate them, so the allocator will reflect the current
cluster state.

The problem is AllocateIPs allows to allocate only one IP per given subnet,
so once it is used for all the IPs in the network at once, it will fail,
causing the control plane to have a crash loop.

Fix it by calling AllocateIPs per each claim on its own.
The claims were already created correctly, so each claim on its
own is safe to call AllocateIPs.

Add unit test to show the relevant assertion.

Fixes #

Additional Information for reviewers

Seen by deleting the pod while there were 2 ipam claims with IPs, from the same NAD
in the system.

✅ Checks

My code requires changes to the documentation
if so, I have updated the documentation as required
My code requires tests
if so, I have added and/or updated the tests as required
All the tests have passed in the CI

How to verify it

oshoval · 2024-11-07T07:28:12Z

Added unit test that shows the current AllocateIPs behavior when trying to allocate several IPs from the same subnet
This behavior is the problematic one for the current Sync logic

maiqueb

Looks good.

go-controller/pkg/allocator/ip/subnet/allocator_test.go

maiqueb · 2024-11-07T11:50:44Z

Maybe we could go a step further and add more unit test that asserts that you can indeed invoke sync for multiple claims:

ipam claim reconciler level:

ovn-kubernetes/go-controller/pkg/persistentips/allocator_test.go

Line 110 in cc3c784

DescribeTable("syncing the IP allocator from the IPAMClaims is successful when provided with", func(ipamClaims ...interface{}) {
ovn-kubernetes/go-controller/pkg/clustermanager/secondary_network_unit_test.go

Line 469 in e492123

ipamClaimWithIPAddr(claimName, namespace, networkName, subnetIP),

(I think this would reproduce if you simply just pre-provision yet another claim in the claim fake client).

@oshoval

oshoval · 2024-11-07T12:14:51Z

Addressed comments
Thanks

qinqon

I don't see a Sync test, just allocators.

go-controller/pkg/persistentips/allocator.go

go-controller/pkg/allocator/ip/subnet/allocator_test.go

go-controller/pkg/clustermanager/secondary_network_unit_test.go

oshoval · 2024-11-07T12:53:12Z

I don't see a Sync test, just allocators.

There are two unit tests that have it, those
#4823 (comment)

go-controller/pkg/clustermanager/secondary_network_unit_test.go

oshoval · 2024-11-07T13:03:27Z

addressed comments
thanks

qinqon

/lgtm

In case the ovn control plane pod is restarted, (for example, node restart, upgrade, or any pod restart reason), the ipam claim Sync function created an array of all the ipams belonging to a specific network, and used named allocator AllocateIPs method to allocate them, so the allocator will reflect the current cluster state. The problem is AllocateIPs allows to allocate only one IP per given subnet, so once it is used for all the IPs in the network at once, it will fail, causing the control plane to have a crash loop. Fix it by calling AllocateIPs per each claim on its own. The claims were already created correctly, so each claim on its own is safe to call AllocateIPs. Add unit test to show the relevant assertion. Signed-off-by: Or Shoval <[email protected]>

oshoval · 2024-11-13T12:12:54Z

rebased

oshoval changed the title ~~allocator: Fix sync~~ WIP allocator: Fix sync Nov 6, 2024

oshoval force-pushed the fix_sync branch 2 times, most recently from 95c2b59 to 4fd51dd Compare November 6, 2024 11:17

oshoval changed the title ~~WIP allocator: Fix sync~~ allocator: Fix sync in case there are multiple sticky IPs from same subnet Nov 6, 2024

oshoval changed the title ~~allocator: Fix sync in case there are multiple sticky IPs from same subnet~~ WIP allocator: Fix sync in case there are multiple sticky IPs from same subnet Nov 6, 2024

oshoval marked this pull request as ready for review November 6, 2024 14:26

oshoval force-pushed the fix_sync branch 2 times, most recently from 202efa6 to 455fed4 Compare November 7, 2024 07:27

oshoval force-pushed the fix_sync branch from 455fed4 to 852b7b4 Compare November 7, 2024 10:37

oshoval changed the title ~~WIP allocator: Fix sync in case there are multiple sticky IPs from same subnet~~ ipam: Fix init flow in case there are sticky ips in the system Nov 7, 2024

maiqueb previously approved these changes Nov 7, 2024

View reviewed changes

go-controller/pkg/allocator/ip/subnet/allocator_test.go Outdated Show resolved Hide resolved

go-controller/pkg/allocator/ip/subnet/allocator_test.go Outdated Show resolved Hide resolved

oshoval dismissed maiqueb’s stale review via 4ee9210 November 7, 2024 12:14

oshoval force-pushed the fix_sync branch from 852b7b4 to 4ee9210 Compare November 7, 2024 12:14

qinqon suggested changes Nov 7, 2024

View reviewed changes

go-controller/pkg/persistentips/allocator.go Outdated Show resolved Hide resolved

go-controller/pkg/allocator/ip/subnet/allocator_test.go Outdated Show resolved Hide resolved

go-controller/pkg/allocator/ip/subnet/allocator_test.go Outdated Show resolved Hide resolved

maiqueb reviewed Nov 7, 2024

View reviewed changes

go-controller/pkg/clustermanager/secondary_network_unit_test.go Show resolved Hide resolved

qinqon reviewed Nov 7, 2024

View reviewed changes

go-controller/pkg/clustermanager/secondary_network_unit_test.go Show resolved Hide resolved

oshoval force-pushed the fix_sync branch from 4ee9210 to 6fb37be Compare November 7, 2024 13:03

oshoval requested a review from qinqon November 7, 2024 13:06

maiqueb approved these changes Nov 7, 2024

View reviewed changes

qinqon approved these changes Nov 7, 2024

View reviewed changes

oshoval force-pushed the fix_sync branch from 6fb37be to bc6bb3e Compare November 13, 2024 12:12

oshoval requested a review from a team as a code owner November 13, 2024 12:12

oshoval requested a review from girishmg November 13, 2024 12:12

github-actions bot added the area/unit-testing Issues related to adding/updating unit tests label Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ipam: Fix init flow in case there are sticky ips in the system #4823

ipam: Fix init flow in case there are sticky ips in the system #4823

oshoval commented Nov 6, 2024 •

edited

Loading

oshoval commented Nov 7, 2024

maiqueb left a comment

maiqueb commented Nov 7, 2024

oshoval commented Nov 7, 2024

qinqon left a comment

oshoval commented Nov 7, 2024

oshoval commented Nov 7, 2024

qinqon left a comment

oshoval commented Nov 13, 2024

ipam: Fix init flow in case there are sticky ips in the system #4823

Are you sure you want to change the base?

ipam: Fix init flow in case there are sticky ips in the system #4823

Conversation

oshoval commented Nov 6, 2024 • edited Loading

📑 Description

Additional Information for reviewers

✅ Checks

How to verify it

oshoval commented Nov 7, 2024

maiqueb left a comment

Choose a reason for hiding this comment

maiqueb commented Nov 7, 2024

oshoval commented Nov 7, 2024

qinqon left a comment

Choose a reason for hiding this comment

oshoval commented Nov 7, 2024

oshoval commented Nov 7, 2024

qinqon left a comment

Choose a reason for hiding this comment

oshoval commented Nov 13, 2024

oshoval commented Nov 6, 2024 •

edited

Loading