Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified transit agent with a linux geneve/vxlan interface #93

Open
zasherif opened this issue Apr 1, 2020 · 1 comment
Open

Unified transit agent with a linux geneve/vxlan interface #93

zasherif opened this issue Apr 1, 2020 · 1 comment
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@zasherif
Copy link
Contributor

zasherif commented Apr 1, 2020

The current simple endpoint wiring architecture - while being effective in meeting Mizar's initial goals - had several drawbacks. The following diagram shows the current wiring:

We thought that changes to the veth driver mode would have been sufficient to work around this limitation. However, the current veth XDP driver implementation still has several limitations, including:

  • The XDP_REDIRECT action is not as fast as if the XDP driver being executed on the main interface. The best improvement we could achieve over the Generic XDP driver is about 30%. Which is still lagging behind our expectations.
  • The driver restricts the MTU size to be less than 4KB, which is a function in the default memory page size. The MTU limitation is problematic for Mizar since we need to support Jumbo frames to be in parity with Neutron at least.
  • The driver mandates that we load a dummy XDP program on the veth pair in the container/VM network namespace. While this seems okay, it is still problematic since we want all Mizar functionality to be transparent to containers and VM. So that a user of the container cannot alter Mizar's behavior in any way (even by removing an XDP program on the veth interface).
  • Lack of TSO/GSO support and other hardware offloading for that egress path.

Finally, since Mizar loads a transit agent program on each veth peer in the root namespace, it incurred a linear memory usage with the number of endpoints. While this could be negligible in most scenarios, it may be a concern for hosts where we create a large number of containers.

Proposed Changes

In the new architecture, we shall use one Geneve interface for tunneling all outgoing packets. The Geneve interface is common for all endpoints. Inside the container/VM namespace, we will create MACVLAN interfaces and connect them in private mode to the Geneve tunnel interface. A single instance of the transit agent will be attached to the egress clast of the Geneve interface and reused by all the endpoints. The endpoints table will be a global elf map.

All egress packets will trigger the transit agent in their normal packet processing path (fast-path). The transit agent will encapsulate the packet in Geneve as expected and will rewrite the destination IP address to be either the endpoint's transit switch or locally. In the local case, the packet will be picked up immediately by the destination endpoint.

Unlike the current architecture, ingressing packets need not redirected to the Geneve tunnel interface. It's sufficient to XDP_PASS the packet, and the kernel will deliver it normally to the tunnel interface, and the corresponding MACVLAN interface will immediately pick up the decapsulated packet. Since we are not using bridged more, the only overhead incurred is the encapsulation/decapsulation, which we have to account for anyway (even in the current architecture).

This approach has several benefits:

  • There is no need to redirect packets at all. The only actions used in the main XDP program are XDP_PASS or XDP_TX. Not using XDP_REDIRECT had performance benefits as we have shown - at least for the moment - that redirects are incurring performance penality for veth driver mode (even with redirect_map).
  • Since we will have one single shared transit agent, memory consumption is always constant. And independent on the number of endpoints on the host.
  • The control-plane does not need to maintain or discover the mac address of the host since the kernel ARP mechanism will be in effect.
  • VMs shall be treated similarly by using MACVTAP (requires testing).
  • We shall have a further simplified control-plane workflow. The network control-agent does not need to load a transit agent anymore for provisioning an endpoint, and the endpoint creation steps will be minimal. The following snippet shows an example of NCA expected steps:
ip link add veth0 link tunnel0 type macvlan mode private
ip link set veth0 netns ns1
ip netns exec ns1 ip addr add 10.0.0.5/24 dev veth0
ip netns exec ns1 ifconfig veth0 hw ether 0e:73:ae:c8:87:01
ip netns exec ns1 ip link set dev veth0 up

Performance gain

With this proposal we shall have:

  • At least 35% improvement over XDP redirect_maps (with Generic XDP driver)
  • 9-10% improvement over an OVS based setup, even though in that setup, OVS directly tunnels the packets to the end hosts.
@zasherif zasherif added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels May 13, 2020
@zasherif zasherif pinned this issue May 13, 2020
@zasherif
Copy link
Contributor Author

@clu2xlu clu2xlu removed the help wanted Extra attention is needed label Jan 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants