Skip to content

Commit

Permalink
docs/use-cases: add nfacctd original use-case
Browse files Browse the repository at this point in the history
Add original nfacctd use-case writeup.
  • Loading branch information
msune committed Aug 28, 2024
1 parent 89e52f9 commit deb4e4b
Show file tree
Hide file tree
Showing 3 changed files with 135 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/use-cases/lb_traffic_no_affinity.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
127 changes: 127 additions & 0 deletions docs/use-cases/network-telemetry-nfacctd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Multi-flow K8s affinity: BGP and IPFIX/Netflow Deploying `nfacctd` in K8s Network telemtry using `pmacct`

This is the original use-case that motivated this project. The original
discussion can be found in this [Cilium's #general slack thread](https://cilium.slack.com/archives/C1MATJ5U5/p1723579808788789).

## Context

### pmacct and datahangar
[pmacct](https://github.com/pmacct/pmacct) is probably _the_ most widely
used Open Source project for passive monitoring of networks. `nfacctd` or
Network Flow ACCounting Daemon, collects flowlogs ([IPFIX](https://en.wikipedia.org/wiki/IP_Flow_Information_Export)/
[Netflow](https://en.wikipedia.org/wiki/NetFlow)/[Sflow](https://en.wikipedia.org/wiki/SFlow))
and enriches them, normalizes values etc. to later export it (e.g. to a DB or
a BUS).

One of the main features of `nfacctd` is to enrich flowlogs with [BGP](https://en.wikipedia.org/wiki/Border_Gateway_Protocol)
information, e.g. `AS_PATH`, `DST_AS`. For doing so, `nfacctd` acts as both
a collector of flowlogs _and_ a BGP passive peer. Routers, therefore, connect to
`nfacctd` at - typically - `TCP/179` and send e.g. IPFIX datagrams to
`UDP/4739`:

![A network router connecting to nfacctd](single_router_nfacctd.svg)

[datahangar](https://github.com/datahangar/) was originally designed as an E2E
test framework for pmacct in the context of Kubernetes. While it still serves
this [purpose](), `datahangar` has evolved into a reference architecture for a
network data pipeline using off-the-shelf OSS components.

While most of the times `nfacctd` runs outside of K8s contexts, and close to
routers, the objective was to make `nfacctd` _as cloud native as possible_,
allowing rescheduling on failure, autoscaling etc.

### Connectivity requirements

BGP and flowlogs must:

* Preserve source IP address, which is used to deduce the router identity.
* End up in the same Pod

## First attempt: `sessionAffinity: ClientIP` and `externalTrafficPolicy: Local`

The initial attempt was to define a `LoadBalancer` service:

```
kind: Service
apiVersion: v1
metadata:
name: nfacctd
spec:
selector:
app: nfacctd
ports:
- name: netflow
protocol: UDP
port: 2055
targetPort: 2055
- name: bgp
protocol: TCP
port: 179
targetPort: 179
type: LoadBalancer
sessionAffinity: ClientIP
externalTrafficPolicy: Local #Do not SNAT to the service!
```

The mockup test quickly shown that IP preservation worked in any of the cases,
but that affinity didn't work wth multiple replicas or multiple worker nodes...
:disappointed:. Flows were hitting different Pods, or even different worker
nodes.

![BPG and Flowlogs traffic end up in different pods](lb_traffic_no_affinity.svg)

## :bulb: What if...

But, what if would be able to modify the traffic _before_ hitting the Network
Load Balancer (NLB), and pretend it's BGP (`TCP/179`), so that `sessionAffinity: ClientIP`
worked, and then "undo" that trick in the Pod, right before the traffic was
delivered to `nfacctd`? Humm, that _might_ work.

Adding a new feature in Kubernetes, all NLBs and CNIs in the world wasn't quite
an option :sweat_smile:, so it was obvious that the solution ought to be a bit
of a hack.

Time to go back to the drawing board...

## :honeybee: eBPF to the rescue!

### Funneling traffic through a single protocol and port

Let's call it [funneling](../funneling.md) to not confuse it with a real tunnel.

The diagram would show:

XXXX

Netflow/IPFIX traffic (only) would have to be intercepted, mangled, and then sent
to the NLB. This could be either done by modifying Netflow/IPFIX traffic _while_
routing it in intermediate nodes, e.g.:


or by pointing routers to one or more "funnelers" that mangle the packet and
DNAT it to the NLB. E.g.:

XXX

### Time to eBPF it :honeybee:!

The original prototype was as easy as this:


#### The code

```
```
The initial

#### Using an initContainer()

## Conclusion and limitations

XXX
Works, MTU, need for funnelers.

## Acknowledgments

Thank you to Martynas Pumputis, Chance Zibolski and Daniel Borkmann for their
support in the Cilium community.
4 changes: 4 additions & 0 deletions docs/use-cases/single_router_nfacctd.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit deb4e4b

Please sign in to comment.