Skip to content

Commit

Permalink
Merge pull request spidernet-io#3686 from cyclinder/coordinator/overl…
Browse files Browse the repository at this point in the history
…ay_policy_routing

fix: fail to access NodePort when pod owning multiple network cards
  • Loading branch information
weizhoublue committed Jul 17, 2024
2 parents 0f4724d + 4b166d9 commit 93f2665
Show file tree
Hide file tree
Showing 8 changed files with 334 additions and 74 deletions.
24 changes: 24 additions & 0 deletions cmd/coordinator/cmd/utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,30 @@ func (c *coordinator) tunePodRoutes(logger *zap.Logger, configDefaultRouteNIC st
return err
}
}

if c.tuneMode == ModeOverlay && c.firstInvoke {
// mv calico or cilium default route to table 500 to fix to the problem of
// inconsistent routes, the pod forwards the response packet from net1 (macvlan)
// when it sends the response packet. but the request packet comes in eth0(calico).
// see https://github.com/spidernet-io/spiderpool/issues/3683

// copy to table 500,
podOverlayDefaultRouteRuleTable := c.hostRuleTable
for idx := range defaultInterfaceAddress {
ipNet := networking.ConvertMaxMaskIPNet(defaultInterfaceAddress[idx].IP)
err = networking.AddFromRuleTable(ipNet, podOverlayDefaultRouteRuleTable)
if err != nil {
logger.Error("failed to AddFromRuleTable", zap.Error(err))
return err
}
}

// move all routes of the specified interface to a new route table
if err = networking.CopyDefaultRoute(logger, defaultOverlayVethName, unix.RT_TABLE_MAIN, podOverlayDefaultRouteRuleTable, c.ipFamily); err != nil {
return err
}
}

}
// move all routes of the specified interface to a new route table
if err = networking.MoveRouteTable(logger, moveRouteInterface, unix.RT_TABLE_MAIN, c.currentRuleTable, c.ipFamily); err != nil {
Expand Down
5 changes: 5 additions & 0 deletions docs/usage/install/overlay/get-started-calico-zh_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,7 @@ nginx-4653bc4f24-aswpm net1 10-6-v4 10.6.212.148/16
/# ip rule
0: from all lookup local
32760: from 10.6.212.132 lookup 100
32762: from 10.233.73.210 lookup 500
32766: from all lookup main
32767: from all lookup default
/# ip route
Expand All @@ -284,6 +285,8 @@ default via 10.6.0.1 dev net1
10.6.212.132 dev eth0 scope link
10.233.0.0/18 via 10.6.212.132 dev eth0
10.233.64.0/18 via 10.6.212.132 dev eth0
/ # ip route show table 500
default via 169.254.1.1 dev eth0
```

以上表项解释:
Expand All @@ -297,6 +300,8 @@ default via 10.6.0.1 dev net1
> 这一系列的路由确保 Pod 访问集群内目标时从 eth0 转发,访问外部目标时从 net1 转发
>
> 在默认情况下,Pod 的默认路由保留在 eth0。如果想要保留在其他网卡(如 net1),可以通过在 Pod 的 annotations 中注入: "ipam.spidernet.io/default-route-nic: net1" 实现。
>
> 对于默认路由在 eth0 的场景,pod 中会存在一条 table 为 500 的策略路由, 该路由确保从 eth0 接收的流量从 eth0 转发,防止来回路径不一致导致丢包。

下面测试 Pod 基本网络连通性,以访问 CoreDNS 的 Pod 和 Service 为例:

Expand Down
7 changes: 6 additions & 1 deletion docs/usage/install/overlay/get-started-calico.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,8 @@ Enter the Pod and use the command `ip` to view information such as IP addresses
valid_lft forever preferred_lft forever
/# ip rule
0: from all lookup local
32760: from 10.6.212.145 lookup 100
32760: from 10.6.212.132 lookup 100
32762: from 10.233.73.210 lookup 500
32766: from all lookup main
32767: from all lookup default
/# ip route
Expand All @@ -279,6 +280,8 @@ default via 10.6.0.1 dev net1
10.6.212.132 dev eth0 scope link
10.233.0.0/18 via 10.6.212.132 dev eth0
10.233.64.0/18 via 10.6.212.132 dev eth0
/ # ip route show table 500
default via 169.254.1.1 dev eth0
```

Explanation of the above:
Expand All @@ -292,6 +295,8 @@ Explanation of the above:
> This series of routing rules guarantees that the Pod will forward traffic through eth0 when accessing targets within the cluster and through net1 for external targets.
>
> By default, the Pod's default route is reserved in eth0. To reserve it in net1, add the following annotation to the Pod's metadata: "ipam.spidernet.io/default-route-nic: net1".
>
> If the default route is eth0, a policy-based route with table 500 exists in the pod. This route ensures that traffic received from eth0 is forwarded from eth0 to prevent packet loss caused by inconsistent forward and return paths.
To test the basic network connectivity of the Pod, we will use the example of accessing the CoreDNS Pod and Service:
Expand Down
5 changes: 5 additions & 0 deletions docs/usage/install/overlay/get-started-cilium-zh_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,7 @@ nginx-4653bc4f24-ougjk net1 10-6-v4 10.6.212.230/16
/ # ip rule
0: from all lookup local
32760: from 10.6.212.131 lookup 100
32762:from 10.233.120.101 lookup 500
32766: from all lookup main
32767: from all lookup default
/ # ip route
Expand All @@ -283,6 +284,8 @@ default via 10.6.0.1 dev net1
10.6.212.131 dev eth0 scope link
10.233.0.0/18 via 10.6.212.132 dev eth0
10.233.64.0/18 via 10.6.212.132 dev eth0
/ # ip route show table 500
default via 10.233.65.96 dev eth0
```

以上信息解释:
Expand All @@ -296,6 +299,8 @@ default via 10.6.0.1 dev net1
> 这一系列的路由确保 Pod 访问集群内目标时从 eth0 转发,访问外部目标时从 net1 转发
>
> 在默认情况下,Pod 的默认路由保留在 eth0。如果想要保留在 net1,可以通过在 Pod 的 annotations 中注入: "ipam.spidernet.io/default-route-nic: net1" 实现。
>
> 对于默认路由在 eth0 的场景,pod 中会存在一条 table 为 500 的策略路由, 该路由确保从 eth0 接收的流量从 eth0 转发,防止来回路径不一致导致丢包。

测试 Pod 访问集群东西向流量的连通性,以访问 CoreDNS 的 Pod 和 Service 为例:

Expand Down
5 changes: 5 additions & 0 deletions docs/usage/install/overlay/get-started-cilium.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,7 @@ Use the command `ip` to view the Pod's information such as routes:
/ # ip rule
0: from all lookup local
32760: from 10.6.212.131 lookup 100
32762:from 10.233.120.101 lookup 500
32766: from all lookup main
32767: from all lookup default
/ # ip route
Expand All @@ -279,6 +280,8 @@ default via 10.6.0.1 dev net1
10.6.212.131 dev eth0 scope link
10.233.0.0/18 via 10.6.212.132 dev eth0
10.233.64.0/18 via 10.6.212.132 dev eth0
/ # ip route show table 500
default via 10.233.65.96 dev eth0
```
Explanation of the above:
Expand All @@ -292,6 +295,8 @@ Explanation of the above:
> This series of routing rules guarantees that the Pod will forward traffic through eth0 when accessing targets within the cluster and through net1 for external targets.
>
> By default, the Pod's default route is reserved in eth0. To reserve it in net1, add the following annotation to the Pod's metadata: "ipam.spidernet.io/default-route-nic: net1".
>
> If the default route is eth0, a policy-based route with table 500 exists in the pod. This route ensures that traffic received from eth0 is forwarded from eth0 to prevent packet loss caused by inconsistent forward and return paths.

To test the east-west connectivity of the Pod, we will use the example of accessing the CoreDNS Pod and Service:

Expand Down
187 changes: 128 additions & 59 deletions pkg/networking/networking/route.go
Original file line number Diff line number Diff line change
Expand Up @@ -149,18 +149,57 @@ func AddRoute(logger *zap.Logger, ruleTable, ipFamily int, scope netlink.Scope,
return nil
}

// MoveRouteTable move all routes of the specified interface to a new route table
// Equivalent: `ip route del <route>` and `ip r route add <route> <table>`
func MoveRouteTable(logger *zap.Logger, iface string, srcRuleTable, dstRuleTable, ipfamily int) error {
logger.Debug("Debug MoveRouteTable", zap.String("interface", iface),
zap.Int("srcRuleTable", srcRuleTable), zap.Int("dstRuleTable", dstRuleTable))
func GetLinkIndexAndRoutes(iface string, ipfamily int) (int, []netlink.Route, error) {
link, err := netlink.LinkByName(iface)
if err != nil {
return -1, nil, err
}

routes, err := netlink.RouteList(nil, ipfamily)
if err != nil {
return -1, nil, err
}

return link.Attrs().Index, routes, nil
}

// CopyDefaultRoute found the default route of pod's eth0 nic, and copy this
// to dstRuleTable.
func CopyDefaultRoute(logger *zap.Logger, iface string, srcRuleTable, podOverlayDefaultRouteRuleTable, ipfamily int) error {
logger.Debug("Debug MoveRouteTable", zap.String("interface", iface),
zap.Int("srcRuleTable", srcRuleTable), zap.Int("dstRuleTable", podOverlayDefaultRouteRuleTable))

linkIndex, routes, err := GetLinkIndexAndRoutes(iface, ipfamily)
if err != nil {
logger.Error(err.Error())
return err
}

routes, err := netlink.RouteList(nil, ipfamily)
for _, route := range routes {
// only handle route tables from table main
if route.Table != srcRuleTable {
continue
}

// ignore local link route
if route.Dst.String() == "fe80::/64" {
continue
}

if err = moveRouteTable(linkIndex, srcRuleTable, podOverlayDefaultRouteRuleTable, true, route, logger); err != nil {
return err
}

}
return nil
}

// MoveRouteTable move all routes of the specified interface to a new route table
// Equivalent: `ip route del <route>` and `ip r route add <route> <table>`
func MoveRouteTable(logger *zap.Logger, iface string, srcRuleTable, dstRuleTable, ipfamily int) error {
logger.Debug("Debug MoveRouteTable", zap.String("interface", iface),
zap.Int("srcRuleTable", srcRuleTable), zap.Int("dstRuleTable", dstRuleTable))
linkIndex, routes, err := GetLinkIndexAndRoutes(iface, ipfamily)
if err != nil {
logger.Error(err.Error())
return err
Expand All @@ -177,72 +216,102 @@ func MoveRouteTable(logger *zap.Logger, iface string, srcRuleTable, dstRuleTable
continue
}

if route.LinkIndex == link.Attrs().Index {
// only delete default route
if route.Dst == nil || route.Dst.IP.Equal(net.IPv4zero) || route.Dst.IP.Equal(net.IPv6zero) {
if err = netlink.RouteDel(&route); err != nil {
logger.Error("failed to RouteDel in main", zap.String("route", route.String()), zap.Error(err))
return fmt.Errorf("failed to RouteDel %s in main table: %+v", route.String(), err)
}
logger.Debug("Del the default route from main successfully", zap.String("Route", route.String()))
}
if err = moveRouteTable(linkIndex, srcRuleTable, dstRuleTable, false, route, logger); err != nil {
return err
}

}
return nil
}

// we need copy the all routes in main table of the podDefaultRouteNic to dstRuleTable.
// Otherwise, the reply packet don't know
// moveRouteTable move route table from srcRuleTable to dstRuleTable. NOTE: if copyOverlayDefaultRoute is true,
// only add the default route to host rule table and exit in advance.
func moveRouteTable(linkIndex, srcRuleTable, dstRuleTable int, onlyCopyOverlayDefaultRoute bool, route netlink.Route, logger *zap.Logger) error {
var err error
if route.LinkIndex == linkIndex {
if route.Dst == nil || route.Dst.IP.Equal(net.IPv4zero) || route.Dst.IP.Equal(net.IPv6zero) {
route.Table = dstRuleTable
if err = netlink.RouteAdd(&route); err != nil && !os.IsExist(err) {
logger.Error("failed to RouteAdd in new table ", zap.String("route", route.String()), zap.Error(err))
logger.Error("failed to copy overlay default route to hostRuleTable", zap.String("route", route.String()), zap.Error(err))
return fmt.Errorf("failed to RouteAdd (%+v) to new table: %+v", route, err)
}
logger.Debug("MoveRoute to new table successfully", zap.String("Route", route.String()))
} else {
// in high kernel, if pod has multi ipv6 default routes, all default routes
// will be put in MultiPath
/*
{
Gw: [{Ifindex: 3 Weight: 1 Gw: fd00:10:7::103 Flags: []} {Ifindex: 5 Weight: 1 Gw: fd00:10:6::100 Flags: []}]}"
}
*/
if len(route.MultiPath) == 0 {
continue
}
logger.Debug("Copy the overlay default route to hostRuleTable successfully", zap.String("Route", route.String()))

var generatedRoute, deletedRoute *netlink.Route
// get generated default Route for new table
for _, v := range route.MultiPath {
logger.Debug("Found IPv6 Default Route", zap.String("Route", route.String()),
zap.Int("v.LinkIndex", v.LinkIndex), zap.Int("link.Attrs().Index", link.Attrs().Index))
if v.LinkIndex == link.Attrs().Index {
generatedRoute = &netlink.Route{
LinkIndex: v.LinkIndex,
Gw: v.Gw,
Table: dstRuleTable,
MTU: route.MTU,
}
deletedRoute = &netlink.Route{
LinkIndex: v.LinkIndex,
Gw: v.Gw,
Table: srcRuleTable,
}
break
}
}
if generatedRoute == nil {
continue
if onlyCopyOverlayDefaultRoute {
// only copy overlay default route, don't need delete the default route
return nil
}

logger.Debug("Deleting IPv6 DefaultRoute", zap.String("deletedRoute", deletedRoute.String()))
if err := netlink.RouteDel(deletedRoute); err != nil {
logger.Error("failed to RouteDel for IPv6", zap.String("Route", route.String()), zap.Error(err))
return fmt.Errorf("failed to RouteDel %v for IPv6: %+v", route.String(), err)
// Del the default route from main
route.Table = srcRuleTable
if err = netlink.RouteDel(&route); err != nil {
logger.Error("failed to RouteDel in main", zap.String("route", route.String()), zap.Error(err))
return fmt.Errorf("failed to RouteDel %s in main table: %+v", route.String(), err)
}
logger.Debug("Del the default route from main successfully", zap.String("Route", route.String()))
}

if onlyCopyOverlayDefaultRoute {
// only copy overlay default route, don't need add non-default routes
return nil
}

// we need copy the all routes in main table of the podDefaultRouteNic to dstRuleTable.
// Otherwise, the reply packet don't know
if err = netlink.RouteAdd(&route); err != nil && !os.IsExist(err) {
logger.Error("failed to RouteAdd in new table ", zap.String("route", route.String()), zap.Error(err))
return fmt.Errorf("failed to RouteAdd (%+v) to new table: %+v", route, err)
}
logger.Debug("MoveRoute to new table successfully", zap.String("Route", route.String()))
return nil
}

if err = netlink.RouteAdd(generatedRoute); err != nil && !os.IsExist(err) {
logger.Error("failed to RouteAdd for IPv6 to new table", zap.String("route", route.String()), zap.Error(err))
return fmt.Errorf("failed to RouteAdd for IPv6 (%+v) to new table: %+v", route.String(), err)
// in high kernel, if pod has multi ipv6 default routes, all default routes
// will be put in MultiPath
/*
{
Gw: [{Ifindex: 3 Weight: 1 Gw: fd00:10:7::103 Flags: []} {Ifindex: 5 Weight: 1 Gw: fd00:10:6::100 Flags: []}]}"
}
*/
if len(route.MultiPath) == 0 {
return nil
}

var generatedRoute, deletedRoute *netlink.Route
// get generated default Route for new table
for _, v := range route.MultiPath {
logger.Debug("Found IPv6 Default Route", zap.String("Route", route.String()),
zap.Int("v.LinkIndex", linkIndex), zap.Int("link.Attrs().Index", linkIndex))
if v.LinkIndex == linkIndex {
generatedRoute = &netlink.Route{
LinkIndex: v.LinkIndex,
Gw: v.Gw,
Table: dstRuleTable,
MTU: route.MTU,
}
deletedRoute = &netlink.Route{
LinkIndex: v.LinkIndex,
Gw: v.Gw,
Table: srcRuleTable,
}
break
}
}

if generatedRoute == nil || onlyCopyOverlayDefaultRoute {
return nil
}

logger.Debug("Deleting IPv6 DefaultRoute", zap.String("deletedRoute", deletedRoute.String()))
if err := netlink.RouteDel(deletedRoute); err != nil {
logger.Error("failed to RouteDel for IPv6", zap.String("Route", route.String()), zap.Error(err))
return fmt.Errorf("failed to RouteDel %v for IPv6: %+v", route.String(), err)
}

if err = netlink.RouteAdd(generatedRoute); err != nil && !os.IsExist(err) {
logger.Error("failed to RouteAdd for IPv6 to new table", zap.String("route", route.String()), zap.Error(err))
return fmt.Errorf("failed to RouteAdd for IPv6 (%+v) to new table: %+v", route.String(), err)
}
return nil
}

Expand Down
1 change: 1 addition & 0 deletions test/doc/coordinator.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@
| C00017 | TunePodRoutes If false, no routing will be coordinated | p3 | | done | |
| C00018 | The conflict IPs for stateless Pod should be released | p3 | | done | |
| C00019 | The conflict IPs for stateful Pod should not be released | p3 | | done | |
| C00020 | kdoctor connectivity should be succeed with annotations: ipam.spidernet.io/default-route-nic: net1 | p3 | | done | |
Loading

0 comments on commit 93f2665

Please sign in to comment.