-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs/use-cases: add nfacctd original use-case
Add original nfacctd use-case writeup.
- Loading branch information
Showing
6 changed files
with
169 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,149 @@ | ||
# Multi-flow K8s affinity: BGP and IPFIX/Netflow Deploying `nfacctd` in K8s Network telemtry using `pmacct` | ||
|
||
This is the use-case that led to `sfunnel` [[1](https://cilium.slack.com/archives/C1MATJ5U5/p1723579808788789)]. | ||
|
||
## Context | ||
### pmacct and datahangar projects | ||
|
||
[pmacct](https://github.com/pmacct/pmacct) is probably _the_ most widely | ||
used Open Source project for passive monitoring of networks. `nfacctd` or | ||
Network Flow ACCounting Daemon, collects flowlogs ([IPFIX](https://en.wikipedia.org/wiki/IP_Flow_Information_Export)/ | ||
[Netflow](https://en.wikipedia.org/wiki/NetFlow)/[Sflow](https://en.wikipedia.org/wiki/SFlow)) | ||
and enriches them, normalizes values etc. to later export it (e.g. to a DB or | ||
a message bus). | ||
|
||
One of the main features of `nfacctd` is to enrich flowlogs with [BGP](https://en.wikipedia.org/wiki/Border_Gateway_Protocol) | ||
information, e.g. `AS_PATH`, `DST_AS`. | ||
|
||
For doing so, `nfacctd` acts as both a flowlogs collector _and_ a BGP passive | ||
peer for one or more network routers: | ||
|
||
![A network router connecting to nfacctd](single_router_nfacctd.svg) | ||
|
||
[datahangar](https://github.com/datahangar/) was initially created as an | ||
end-to-end(E2E) testing framework for pmacct, focusing on its containerization | ||
and deployment in Kubernetes. | ||
|
||
While it still fulfills [this role](https://github.com/pmacct/pmacct/blob/master/.github/workflows/e2e_dh.yaml), | ||
datahangar is evolving towards establishing a reference architecture for a | ||
complete network data pipeline using readily available open-source components in | ||
Kubernetes. | ||
|
||
### Connectivity requirements | ||
|
||
BGP and flowlogs traffic must: | ||
|
||
* Preserve source IP address, which is used to deduce the router identity. | ||
* End up in the same Pod (replica). | ||
|
||
![Proper multi-flow affinity working](lb_traffic_affinity_ok.svg) | ||
|
||
### Typical deployment scenarios | ||
|
||
Given the connectivity requirements, most `nfacctd` instances are deployed outside | ||
Kubernetes today. The goal has been to ensure that `nfacctd` can be effectively | ||
deployed an scaled within a Kubernetes environment. | ||
|
||
#### Public cloud | ||
|
||
BPG and flowlogs traffic are typically tunneled via a VPN or a Direct Connect | ||
to the VPC. `nfacctd`'s are either deployed on-prem or in the VPC, manually | ||
managed outside of K8s. | ||
|
||
![Typical deployment on public clouds](deployment1.svg) | ||
|
||
### On-prem | ||
|
||
Similarly: | ||
|
||
![Typical deployment setup on-prem](deployment2.svg) | ||
|
||
## First attempt: `sessionAffinity: ClientIP` and `externalTrafficPolicy: Local` | ||
|
||
The initial attempt was to define a `LoadBalancer` service: | ||
|
||
``` | ||
kind: Service | ||
apiVersion: v1 | ||
metadata: | ||
name: nfacctd | ||
spec: | ||
selector: | ||
app: nfacctd | ||
ports: | ||
- name: netflow | ||
protocol: UDP | ||
port: 2055 | ||
targetPort: 2055 | ||
- name: bgp | ||
protocol: TCP | ||
port: 179 | ||
targetPort: 179 | ||
type: LoadBalancer | ||
sessionAffinity: ClientIP | ||
externalTrafficPolicy: Local #Do not SNAT to the service! | ||
``` | ||
|
||
The mockup test quickly shown that IP preservation worked in any of the cases, | ||
but that affinity didn't work wth multiple replicas or multiple worker nodes... | ||
:disappointed:. Flows were hitting different Pods, or even different worker | ||
nodes. | ||
|
||
![BPG and Flowlogs traffic end up in different pods](lb_traffic_no_affinity.svg) | ||
|
||
Implementing a new feature across every Kubernetes instance, NLB, and CNI on the | ||
planet? Yeah, that’s not exactly a weekend project :sweat_smile:. It quickly | ||
became clear that we'd need to whip up a clever workaround to avoid spending | ||
the rest of our lives on this! | ||
|
||
## :bulb: What if... | ||
|
||
What if could modify the traffic _before_ hitting the Network Load Balancer (NLB), | ||
and disguise it as BGP (`TCP/179`), so that `sessionAffinity: ClientIP` would | ||
do its job, and then "undo" this modification in the Pod, just before the traffic | ||
is delivered to `nfacctd`? Humm, that _might_ work. | ||
|
||
Time to go back to the drawing board... | ||
|
||
## :honeybee: eBPF to the rescue! | ||
|
||
### Funneling traffic through a single protocol and port | ||
|
||
Let's call it [funneling](../funneling.md) to not confuse it with a real tunnel. | ||
|
||
The diagram would show: | ||
|
||
XXXX | ||
|
||
Netflow/IPFIX traffic (only) would have to be intercepted, mangled, and then sent | ||
to the NLB. This could be either done by modifying Netflow/IPFIX traffic _while_ | ||
routing it in intermediate nodes, e.g.: | ||
|
||
|
||
or by pointing routers to one or more "funnelers" that mangle the packet and | ||
DNAT it to the NLB. E.g.: | ||
|
||
XXX | ||
|
||
### Time to eBPF it :honeybee:! | ||
|
||
The original prototype was as easy as this: | ||
|
||
|
||
#### The code | ||
|
||
``` | ||
``` | ||
The initial | ||
|
||
#### Using an initContainer() | ||
|
||
## Conclusion and limitations | ||
|
||
XXX | ||
Works, MTU, need for funnelers. | ||
|
||
## Acknowledgments | ||
|
||
Thank you to Martynas Pumputis, Chance Zibolski and Daniel Borkmann for their | ||
support in the Cilium community. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.