Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove stale LRSRs from default and network scopped cluster router #4679

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

arghosh93
Copy link
Contributor

@arghosh93 arghosh93 commented Aug 30, 2024

This PR removes logical router static routes with stale IPs when join or transit switch IP gets changed as part of day 2 operation.

ip4.dst == 10.244.0.0/24 , nexthop = 100.88.0.2 dst-ip
ip4.dst == 100.64.0.2/16 , nexthop = 100.88.0.2 dst-ip
ip4.dst == 10.244.0.0/16 , nexthop = 100.64.0.4 src-ip
ip4.dst == 100.64.0.4 , nexthop = 100.64.0.4 dst-ip
ip4.dst == 10.244.2.0/24 , nexthop = 100.64.0.4 src-ip

What this PR does and why is it needed

Which issue(s) this PR fixes

Jira: https://issues.redhat.com/browse/OCPBUGS-36177
Fixes #4693 additionally.

Special notes for reviewers

How to verify it

Details to documentation updates

Description for the changelog

Does this PR introduce a user-facing change?


@arghosh93 arghosh93 requested a review from a team as a code owner August 30, 2024 17:57
@github-actions github-actions bot added the area/unit-testing Issues related to adding/updating unit tests label Aug 30, 2024
@tssurya tssurya added the area/core-kubernetes Issues related to core kubernetes like pods, services, endpoints, endpointslices label Aug 31, 2024
This commit is to effectively filter LRSR created for default
cluster router and prevent deletion of any other required routes.

Previous predicate was causing 2 LRSRs to be selected while execution
of CreateOrReplaceLogicalRouterStaticRouteWithPredicate function and
was causing one of the routes to be deleted upon ovnkube-controller
restart.

10.244.0.0/16            100.64.0.4 src-ip
10.244.2.0/24            100.64.0.4 src-ip

Upstream issue: ovn-kubernetes#4693

Signed-off-by: Arnab Ghosh <[email protected]>
This PR removes logical router static routes with stale IPs when
join or transit switch IP gets changed as part of day 2 operation.

ip4.dst == 10.244.0.0/24 , nexthop = 100.88.0.2 dst-ip
ip4.dst == 100.64.0.2/16 , nexthop = 100.88.0.2 dst-ip
ip4.dst == 10.244.0.0/16 , nexthop = 100.64.0.4 src-ip
ip4.dst == 100.64.0.4 , nexthop = 100.64.0.4 dst-ip

Signed-off-by: Arnab Ghosh <[email protected]>
@tssurya tssurya requested review from tssurya and removed request for jcaamano October 1, 2024 15:51

nodeTransitSwitchPortIPs, err := util.ParseNodeTransitSwitchPortAddrs(node)
if err != nil || len(nodeTransitSwitchPortIPs) == 0 {
return fmt.Errorf("failed to get the node transit switch port Ips : %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a better idea to aggregate the errors here instead of doing the early exit. There is still a chance to cleanup the join addr routes.

if util.IsAnnotationNotSetError(err) {
var err1 error
// this is for backwards compatibility
nodeGRPIPs, err1 = util.ParseNodeGatewayRouterLRPAddrs(node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure we still need that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. I checked other places where we are getting join IP from annotation and in all those places we are running ParseNodeGatewayRouterLRPAddrs() if IsAnnotationNotSetError() is true.

return lrsr.ExternalIDs["ic-node"] == node.Name && lrsr.ExternalIDs[types.NetworkExternalID] == zic.GetNetworkName() &&
(lrsr.Nexthop != v6TransitSwitchIP.IP.String() || (nodeSubnet != nil && lrsr.IPPrefix != nodeSubnet.String() &&
lrsr.IPPrefix != v6JoinIP.IP.String()+util.GetIPFullMaskString(v6JoinIP.IP.String())))
} else if !utilnet.IsIPv6String(lrsr.Nexthop) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Use utilnet.IsIPv4String instead

return lrsr.ExternalIDs["ic-node"] == node.Name && lrsr.ExternalIDs[types.NetworkExternalID] == zic.GetNetworkName() &&
(lrsr.Nexthop != v6TransitSwitchIP.IP.String() || (nodeSubnet != nil && lrsr.IPPrefix != nodeSubnet.String() &&
lrsr.IPPrefix != v6JoinIP.IP.String()+util.GetIPFullMaskString(v6JoinIP.IP.String())))
} else if !utilnet.IsIPv6String(lrsr.Nexthop) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for the else if since the first if returns.

return false
}
err = libovsdbops.DeleteLogicalRouterStaticRoutesWithPredicate(zic.nbClient, zic.networkClusterRouterName, p)
if err != nil && !errors.Is(err, libovsdbclient.ErrNotFound) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that DeleteLogicalRouterStaticRoutesWithPredicate returns libovsdbclient.ErrNotFound if there was nothing to delete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaa, you are right , ErrNotFound is set to false at operationModel.

if !config.OVNKubernetesFeature.EnableInterconnect {
lrsr.ExternalIDs["nodeName"] = nodeName
}
} else if !config.OVNKubernetesFeature.EnableInterconnect {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it intentional that you override all ExternalIDs in case of non-secondary networks?
Otherwise, You could just extract the if and do it once:

if gw.netInfo.IsSecondary() {
...
}
...
if !config.OVNKubernetesFeature.EnableInterconnect {
}

@@ -633,7 +636,8 @@ func (zic *ZoneInterconnectHandler) addRemoteNodeStaticRoutes(node *corev1.Node,
addRoute := func(prefix, nexthop string) error {
logicalRouterStaticRoute := nbdb.LogicalRouterStaticRoute{
ExternalIDs: map[string]string{
"ic-node": node.Name,
"ic-node": node.Name,
types.NetworkExternalID: zic.GetNetworkName(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we usually also add the types.TopologyExternalID, it wouldn't hurt to have it here also, especially if we allow same network names on different topologies.

}
p := func(lrsr *nbdb.LogicalRouterStaticRoute) bool {
if utilnet.IsIPv6String(lrsr.Nexthop) {
nodeSubnet, _ := util.MatchFirstIPNetFamily(true, nodeSubnets)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do the the check if err != nil

if lrsr.Nexthop != gwLRPIP && utilnet.IPFamilyOfString(lrsr.Nexthop) == utilnet.IPFamilyOfString(gwLRPIP) &&
lrsr.Nexthop == lrsr.IPPrefix {
networkName, isSecondaryNetwork := lrsr.ExternalIDs[types.NetworkExternalID]
if isSecondaryNetwork && networkName == gw.netInfo.GetNetworkName() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This nested if/elseif block could be simplified.
I wonder if there is any benefit in not including the nodeName when EnableInterconnect is true, if we had it would simplify this check.

}

err := libovsdbops.DeleteLogicalRouterStaticRoutesWithPredicate(gw.nbClient, gw.clusterRouterName, p)
if err != nil && !errors.Is(err, libovsdbclient.ErrNotFound) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that DeleteLogicalRouterStaticRoutesWithPredicate returns libovsdbclient.ErrNotFound

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core-kubernetes Issues related to core kubernetes like pods, services, endpoints, endpointslices area/unit-testing Issues related to adding/updating unit tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 LRSR from default cluster router being deleted upon ovnkube-controller restart
3 participants