From 9da138d9c5fd92f3fbf6c34a5331a70cb6e1b7c0 Mon Sep 17 00:00:00 2001 From: Marc Sune Date: Wed, 28 Aug 2024 13:27:01 +0200 Subject: [PATCH] docs/use-cases: add nfacctd original use-case Add original nfacctd use-case writeup. --- docs/use-cases/network-telemetry-nfacctd.md | 110 ++++++++++++++++++++ 1 file changed, 110 insertions(+) create mode 100644 docs/use-cases/network-telemetry-nfacctd.md diff --git a/docs/use-cases/network-telemetry-nfacctd.md b/docs/use-cases/network-telemetry-nfacctd.md new file mode 100644 index 0000000..c0b1558 --- /dev/null +++ b/docs/use-cases/network-telemetry-nfacctd.md @@ -0,0 +1,110 @@ +# Multi-flow K8s affinity: BGP and IPFIX/Netflow Deploying `nfacctd` in K8s Network telemtry using `pmacct` + +This is the original use-case that motivated the creation of `sfunnel`, which +originated . + +## Context + +[pmacct](https://github.com/pmacct/pmacct) is probably _the_ most widely +used Open Source project for passive monitoring of networks. `nfacctd` or +Network Flow ACCounting Daemon, collects flowlogs ([IPFIX](https://en.wikipedia.org/wiki/IP_Flow_Information_Export)/ +[Netflow](https://en.wikipedia.org/wiki/NetFlow)/[Sflow](https://en.wikipedia.org/wiki/SFlow)) +and enriches them, normalizes values etc. to later export it (e.g. to a DB or +a BUS). + +One of the main features of `nfacctd` is to enrich flowlogs with [BGP](https://en.wikipedia.org/wiki/Border_Gateway_Protocol) +information, e.g. `AS_PATH`, `DST_AS`. For doing so, `nfacctd` acts as both +a collector of flowlogs _and_ a BGP passive peer. Routers, therefore, connect +`TCP/179` to `nfacctd` and send e.g. IPFIX datagrams to `UDP/4739`: + +XXX: diagram router sending 2 flows to nfacctd + +[datahangar](https://github.com/datahangar/) originally XXX + +### Requirements + +* Use a cloud-native approach to deploy `nfacctd` to be able to auto-scale + easily failover etc. +* Preserve IP addresses for BGP and flowlogs traffic +* Make sure **both** BGP and flowlogs traffic end up in the same Pod at any + given point in time (without reschedulings). + +## First attempt: `sessionAffinity: ClientIP` and `externalTrafficPolicy: Local` + +The initial attempt`LoadBalancer` service: + +``` +kind: Service +apiVersion: v1 +metadata: + name: nfacctd +spec: + selector: + app: nfacctd + ports: + - name: netflow + protocol: UDP + port: 2055 + targetPort: 2055 + - name: bgp + protocol: TCP + port: 179 + targetPort: 179 + type: LoadBalancer + sessionAffinity: ClientIP + externalTrafficPolicy: Local #Do not SNAT to the service! +``` + +The mockup test quickly shown that IP preservation worked in any of the cases, +but that affinity didn't work wth multiple replicas or multiple worker nodes. + +Time to go back to the drawing board... + +## :honeybee: eBPF to the rescue! + +Adding a new feature in Kubernetes, all NLBs and CNIs in the world wasn't quite an option +:worried:, so it was obvious that the solution ought to be a bit of a hack. + +### Funneling traffic through a single protocol and port + +What if would be able to modify the traffic _before_ hitting the Network Load +Balancer, and pretend it's BGP, a.k.a. `TCP/179` and we would then "undo" that +in the Pod, right before the traffic was delivered to `nfacctd`? + +Let's call it [funneling](../funneling.md) to not confuse it with a real tunnel. + +The diagram would show: + +XXXX + +Netflow/IPFIX traffic (only) would have to be intercepted, mangled, and then sent +to the NLB. This could be either done by modifying Netflow/IPFIX traffic _while_ +routing it in intermediate nodes, e.g.: + + +or by pointing routers to one or more "funnelers" that mangle the packet and +DNAT it to the NLB. E.g.: + +XXX + +### Time to eBPF it :honeybee:! + +The original prototype was as easy as this: + + +#### The code + +``` +``` +The initial + +#### Using an initContainer() + +## Conclusion and limitations + +XXX +Works, MTU, need for funnelers. + +## Acknowledgments + +Thank you to martynas, daniel, nlb