Skip to content

Commit

Permalink
docs/use-cases: add nfacctd original use-case
Browse files Browse the repository at this point in the history
Add original nfacctd use-case writeup.
  • Loading branch information
msune committed Aug 28, 2024
1 parent 89e52f9 commit 5afaeed
Show file tree
Hide file tree
Showing 2 changed files with 117 additions and 0 deletions.
113 changes: 113 additions & 0 deletions docs/use-cases/network-telemetry-nfacctd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Multi-flow K8s affinity: BGP and IPFIX/Netflow Deploying `nfacctd` in K8s Network telemtry using `pmacct`

This is the original use-case that motivated this project. The original
discussion can be found in this [Cilium's #general slack thread](https://cilium.slack.com/archives/C1MATJ5U5/p1723579808788789).

## Context

[pmacct](https://github.com/pmacct/pmacct) is probably _the_ most widely
used Open Source project for passive monitoring of networks. `nfacctd` or
Network Flow ACCounting Daemon, collects flowlogs ([IPFIX](https://en.wikipedia.org/wiki/IP_Flow_Information_Export)/
[Netflow](https://en.wikipedia.org/wiki/NetFlow)/[Sflow](https://en.wikipedia.org/wiki/SFlow))
and enriches them, normalizes values etc. to later export it (e.g. to a DB or
a BUS).

One of the main features of `nfacctd` is to enrich flowlogs with [BGP](https://en.wikipedia.org/wiki/Border_Gateway_Protocol)
information, e.g. `AS_PATH`, `DST_AS`. For doing so, `nfacctd` acts as both
a collector of flowlogs _and_ a BGP passive peer. Routers, therefore, connect to
`nfacctd` at - typically - `TCP/179` and send e.g. IPFIX datagrams to
`UDP/4739`:

<p width="100%" style="text-align:center">
<img src="single_router_nfacctd.svg" width="40%" alt="A router connecting to nfacctd">
</p>

[datahangar](https://github.com/datahangar/) originally XXX

### Requirements

* Use a cloud-native approach to deploy `nfacctd` to be able to auto-scale
easily failover etc.
* Preserve IP addresses for BGP and flowlogs traffic
* Make sure **both** BGP and flowlogs traffic end up in the same Pod at any
given point in time (without reschedulings).

## First attempt: `sessionAffinity: ClientIP` and `externalTrafficPolicy: Local`

The initial attempt`LoadBalancer` service:

```
kind: Service
apiVersion: v1
metadata:
name: nfacctd
spec:
selector:
app: nfacctd
ports:
- name: netflow
protocol: UDP
port: 2055
targetPort: 2055
- name: bgp
protocol: TCP
port: 179
targetPort: 179
type: LoadBalancer
sessionAffinity: ClientIP
externalTrafficPolicy: Local #Do not SNAT to the service!
```

The mockup test quickly shown that IP preservation worked in any of the cases,
but that affinity didn't work wth multiple replicas or multiple worker nodes.

Time to go back to the drawing board...

## :honeybee: eBPF to the rescue!

Adding a new feature in Kubernetes, all NLBs and CNIs in the world wasn't quite an option
:worried:, so it was obvious that the solution ought to be a bit of a hack.

### Funneling traffic through a single protocol and port

What if would be able to modify the traffic _before_ hitting the Network Load
Balancer, and pretend it's BGP, a.k.a. `TCP/179` and we would then "undo" that
in the Pod, right before the traffic was delivered to `nfacctd`?

Let's call it [funneling](../funneling.md) to not confuse it with a real tunnel.

The diagram would show:

XXXX

Netflow/IPFIX traffic (only) would have to be intercepted, mangled, and then sent
to the NLB. This could be either done by modifying Netflow/IPFIX traffic _while_
routing it in intermediate nodes, e.g.:


or by pointing routers to one or more "funnelers" that mangle the packet and
DNAT it to the NLB. E.g.:

XXX

### Time to eBPF it :honeybee:!

The original prototype was as easy as this:


#### The code

```
```
The initial

#### Using an initContainer()

## Conclusion and limitations

XXX
Works, MTU, need for funnelers.

## Acknowledgments

Thank you to martynas, daniel, nlb
Loading

0 comments on commit 5afaeed

Please sign in to comment.