Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing /32 FIB entries #1614

Open
gilesheron opened this issue Jul 19, 2019 · 5 comments
Open

missing /32 FIB entries #1614

gilesheron opened this issue Jul 19, 2019 · 5 comments
Labels

Comments

@gilesheron
Copy link
Collaborator

It seems like Contiv-VPP adds a /32 FIB entry in VRF1 on a new node for each existing node in the cluster (by default those are in 192.168.30.0/24).

But it also seems that the existing nodes don't get updated. So, for example, the master node in my cluster only has the /24 plus a local /32, the first worker I start also has a /32 for the master, and the next worker gets that plus a /32 for the first worker - and so on.

So I'm seeing pings drop when destined for 192.168.30.0/24 addresses when there's no matching /32, but stuff behind those address (e.g. pod IPs) seems to be ok (looks like the FIB entry for the /24 resolves to ARP whereas the FIB entry for the IPAM network resolves to the correct next-hop including MAC addresses etc.)

@rastislavs
Copy link
Collaborator

Hey Giles,
we are talking about the vxlanCIDR and the IP addresses applied to the VXLAN BVI interfaces, right?

Contiv-VPP is actually NOT installing any static /32 routes for them. They seem to be installed by VPP itself. I guess the local interface is always there and the rest is installed whenever there is some pod-to-pod communication between particular nodes (since there is almost always something talking to the master node, /32 for the master BVI would be installed on each node).

From where are you trying to ping the remote BVI interfaces? From VPP? Do you use some specific source interface? Apart from ping not working, do you see any issue with communication between the nodes? You know that ping utility on VPP has many issues...

@gilesheron
Copy link
Collaborator Author

interesting. but not sure why e.g. worker2 gets a route for worker1.

and yeah - was trying to ping from VPP using the loop0 as the source.

Thee reason I started looking at this was we had a broken cluster (workers unable to reach etcd IIRC) and that was the only thing I could see different in the FIBs.

will dig some more...

@gilesheron
Copy link
Collaborator Author

oh yes, so it was the vxlanCIDR addresses.

@rastislavs
Copy link
Collaborator

Well if the issue was that workers were unable to reach contiv ETCD, you may need to look at a different place. Contiv-ETCD uses nodeport service, so relies on kube-proxy to do the NAT, then the traffic goes via mgmt node inter-connection as opposed to via VPP (to the master node's mgmt IP from the workers). The reason for that is that the agents need to be able to connect to ETCD even before the CNI starts working (before VPP is running & configured).

Maybe you are hitting this issue? #1430

@rastislavs
Copy link
Collaborator

BTW I am confused about why, when and how VPP installs those /32 routes for other node's BVIs as well. It should not need them at all - there is a /24 route covering them, but I guess it is some runtime optimization in the VPP fib logic? Anyway, I tried to "force" VPP to create them using some pod-to-pod traffic, but wasn't successful. Pod-to-pod traffic between the nodes worked, but the /32 route towards the other node's BVI was not added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants