Skip to content

Commit

Permalink
Tc
Browse files Browse the repository at this point in the history
  • Loading branch information
msune committed Aug 27, 2024
1 parent 2ea9d80 commit 447cc84
Show file tree
Hide file tree
Showing 7 changed files with 628 additions and 0 deletions.
124 changes: 124 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# sfunnel: multi-flow K8s pod session affinity

`sfunnel` is an [eBPF](https://ebpf.io/) tool designed to [_funnel_](docs/funneling)
multiple traffic flows through a single [Kubernetes service](https://kubernetes.io/docs/concepts/services-networking/service/)
_port_, ensuring - under [certain conditions](#requirements) - consistent `ClientIP`
affinity across all _ports_ within the service.

See the original use-case [here](docs/use-cases/network-telemetry-nfacctd.md).

## At a glance

Example where `TCP/8080` and `TCP/443` traffic is funneled through `TCP/80`.


Remove _ports_ from the K8s service and e.g. deployment. Add the `sfunnel`
container along with the [rules](docs/rules.md) in `SFUNNEL_RULESET`:

```diff
--- a/service.yaml
+++ b/service.yaml
@@ -1,18 +1,12 @@
apiVersion: v1
kind: Service
metadata:
name: my-loadbalancer-service
spec:
type: LoadBalancer
selector:
app: my-nginx-app
ports:
- protocol: TCP
port: 80
targetPort: 80
- - protocol: TCP
- port: 8080
- targetPort: 8080
- - protocol: TCP
- port: 443
- targetPort: 443
sessionAffinity: ClientIP
```

```diff
--- a/nginx.yaml
+++ b/nginx.yaml
@@ -1,21 +1,31 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx-deployment
spec:
replicas: 4
selector:
matchLabels:
app: my-nginx-app
template:
metadata:
labels:
app: my-nginx-app
spec:
containers:
+ - name: sfunnel-init
+ env:
+ - name: SFUNNEL_RULESET
+ value: ip tcp dport 80 sport 540 actions unfunnel tcp
+ image: ghcr.io/datahangar/sfunnel:0.0.3
+ securityContext:
+ privileged: true
+ capabilities:
+ add: [BPF, NET_ADMIN]
+ volumeMounts:
+ - name: bpffs
+ mountPath: /sys/fs/bpf
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
- - containerPort: 8080
- - containerPort: 443

```

On the other end (e.g. a Linux host, server etc..), deploy it with the
matching [rules](docs/rules.md):

```shell
SFUNNEL_RULESET="ip daddr <your LB IP1> tcp port 443 actions funnel tcp dport 80 sport 540;\
ip daddr <your LB IP1> tcp port 8080 actions funnel tcp dport 80 sport 540"
docker run --network="host" --privileged -e SFUNNEL_RULESET="$SFUNNEL_RULESET" sfunnel
```

The `sfunnel` container will run and load the eBPF code.

##### More use-cases

This is a simple example yet not very useful example. See [use-cases](docs/use-cases/)
for real world examples.

## Requirements

* In Kubernetes:
* Permissions to spawn containers with `BPF` and `NET_ADMIN` capabilities.
* [eBPF](https://ebpf.io/)-enabled kernel, with support for `clsact` and `direct-action`.
* Proper [MTU configuration](docs/funneling.md#mtu) (20 bytes for TCP, 8 for UDP).
* On the funneling side:
* Permissions to spawn `sfunnel`.
* Route or proxy traffic to be funneled. More on this [here](docs/funneling.md)

Make sure stateful firewalls and IDS/IDPS are properly configured to allow this
type of traffic.

## More...

* [Use-cases](docs/use-cases/)
* [Funneling?](docs/funneling.md)
* [Rule syntax](docs/rules.md)
* [sfunnel container](docs/container.md) and how to deploy in K8s
* [Deploying it in K8s](docs/k8s.md)
* [Next steps](docs/next_steps.md)

Contact
-------

Marc Sune < marcdevel (at) gmail (dot) com>
30 changes: 30 additions & 0 deletions docs/container.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# `sfunnel` container

The `sfunnel` container is meant to run as an initContainer() or as an ephemeral
container (in `docker --network="host"`).

Upon starting, it will:

1. Recompile the BPF program if a custom ruleset is provided. Ruleset is static
at compile-time, so no maps are needed. Mind the [ruleset limits](rules.md#scalability).
1. For each interface in `$IFACES`:
* it creates a `clasct` qdisc
* it attached the BPF program to it

## Environment variables

Some ENV variables control the behaviour of the container:

* `$SFUNNEL_RULESET`: list of rules. This variable has precedence over `/opt/sfunnel/src/ruleset`.
* `$IFACES`: interfaces to load the BPF program. Default: "" (all).
* `$N_ATTEMPTS`: number of attempts on loading the BPF program on an interface. Default 6.
* `$RETRY_DELAY`: delay between attemps. Default: 3.

## Loading Ruleset via file

The ruleset can be loaded via configmap/docker volume by creating the file `ruleset`
in `/opt/sfunnel/src`. This file has precedence over `/opt/sfunnel/src/ruleset.defaults`.

## Life-cycle and garbage collection

XXX
77 changes: 77 additions & 0 deletions docs/funneling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# _Funneling? Isn't it just tunneling_

`sfunnel` pushes a new L4 header (TCP or UDP) between the IP and the existing L4
header. It is a form of pseudo-tunneling, and suffers from the same
[MTU issues](#mtu) as a any tunnel.

Tunnels usually have a dedicated L4 proto+port, and _only_ tunneled traffic is
received on that port. This is not the case when _funneling_, as funneled
traffic will flow alongside with the real traffic, hence the reason to use a
different term to avoid confusion.

For example, when funneling some UDP traffic on top TCP port 80, _some_ traffic
flowing will still be WEB traffic, and will be left untouched, while UDP
traffic on top will be unfunneled (decapped or demultiplexed) and delivered as
UDP traffic transparently.

## The life of a packet

### Funneling

Using [`scapy`]() syntax, with a funneling rule like this:

```
udp dport 4739 actions funnel tcp dport 179 sport 540
```

A(n IPFIX) packet:

```python
Ether()/IP()/UDP(dport=4739)/IPFIX()/...
```

would be convereted into:

```python
Ether()/IP()/TCP(dport=179, sport=540)/UDP(dport=4739)/IPFIX()/...
```

> :pencil: Note
>
> For the record, other TCP fields are currently hardcoded to:
> * `flags`: SYN
> * `seq`: `0xCAFEBABE`
> * `ack_seq`: `0xBABECAFE`
> * `window`: `1024`
> * `urg_ptr`: `0x0`
>
> `funnel` action could be extended to set some of these values (flags in particular)
### Unfunneling; reversing it!

On the other end, typically a K8s pod, a rule like this would exist:

```
tcp dport 179 sport 540 actions unfunnel udp
```

Therefore, the traffic received by the worker node:

```python
Ether()/IP()/TCP(dport=179, sport=540)/UDP(dport=4739)/IPFIX()/...
```

Would be converted back to:

```python
Ether()/IP()/UDP(dport=4739)/IPFIX()/...
```

## MTU

Funneling suffers from the same problems as any encapsulation (tunneling). The
MTU should be sufficiently big to accomodate the extra 20 bytes for TCP funneling
or 8 bytes for UDP funneling.

Make sure you adjust this. An [upcoming feature](next_steps.md) will be to check
for MTU exceeding funneled packets and raise alarms (`printk()`).
153 changes: 153 additions & 0 deletions docs/k8s.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
# Deploying `sfunnel` in K8s

Deploying `sfunnel` as an `initContainer` is straight forward (see [1]),
provided that you have the [right privileges](#capabilities).

For `sfunnel` to work, Services must - obviously - be defined with
`sessionAffinity: ClientIP` in first place. `sfunnel` will attach the eBPF
program to the Pod's `$IFACES`.

> :pencil: **Note**
>
> Make sure to adjust the [MTU](funneling.md#mtu)
## Services

### `LoadBalancer`

Traffic must hit the LB funneled. Therefore, traffic must have been either
generated or routed through a node running `sfunnel` with funneling rules.

`LoadBalancer` services honouring `sessionAffinity: ClientIP` will send traffic
from the tuple {`srcIP`, `protocol`, `srcPort`, `DstPort`} to the same Worker
Node.

In turn, CNIs supporting `sessionAffinity: ClientIP` will send traffic for the
tuple {`srcIP`, `protocol`, `srcPort`, `DstPort`} to the same Pod (until rescheduled).
Traffic entering the Pod Network Namespace will be unfunnel/demultiplexed before
being terminated by the Kernel, and delivered to sockets.

### `NodePort`

Similarly, traffic needs to hit the Worker Node funneled. You could theoretically
run funneling rules _before_ the CNI does its magic, but this is tricky and it's
NOT recommended.

It goes without saying that traffic needs to hit the _right_ `NodePort` for the same
{`srcIP`, `protocol`, `srcPort`, `DstPort`}, otherwise `sessionAffinity: ClientIP`
wouldn't work (even for a single port) in first place.

The process is then the exact same as with the `LoadBalancer` service.

### `ClusterIP`

> :warning: **Warning**
>
> This hasn't been tested, so take it as a plausible conjecture.
This is an interesting one, and not anticipated, as the
[original use-case](docs/use-cases/network-telemetry-nfacctd.md) only used
`LoadBalancer` services.

You can funnel multiple `ClusterIP` services - with multiple ports - into a
single protocol+port, provided that are backed by the same Pod. This effectively
makes all flows from a consumer Pod A talk to the same backend Pod B
until there is a rescheduling.

An example:

Pod A (consumer) ruleset:
```
ip daddr <ClusterIP_1> tcp dport 443 funnel tcp dport 80 sport 540 # HTTPs
ip daddr <ClusterIP_2> tcp dport 8080 funnel tcp dport 80 sport 540 # Proxy HTTP
ip daddr <ClusterIP_3> udp dport 443 funnel udp dport 80 sport 541 # QUIC
```

Pod B (backend) ruleset:
```
tcp dport 80 sport 540 unfunnel tcp
tcp dport 80 sport 541 unfunnel udp
```

## Supported CNIs

In principle, any CNI and LB honouring `sessionAffinity: ClientIP` should work
out of the box.

`sfunnel` has been tested with Cilium v1.15 and v1.16.

## Security considerations

### Capabilities: `CAP_BPF`, `CAP_NET_ADMIN`

`sfunnel` requires elevated privileges to run and load BPF TC programs.

### Digest

> :heavy_exclamation_mark: **Important**
>
> ALWAYS check `sfunnel`'s image `sha256` when running in production.
E.g.:
```
image: ghcr.io/datahangar/sfunnel:0.0.3@sha256:f4f72e64a93f7543e33000d01807fb66257cc88165b580763726aa4a01302655
```

---

##### [1] Example

`lb-service.yaml`:

```yaml
@@ -1,18 +1,12 @@
apiVersion: v1
kind: Service
metadata:
name: my-loadbalancer-service
spec:
type: LoadBalancer
selector:
app: my-nginx-app
ports:
- protocol: TCP
port: 80
targetPort: 80
sessionAffinity: ClientIP
```
`nginx.yaml`:

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx-deployment
spec:
replicas: 4
selector:
matchLabels:
app: my-nginx-app
template:
metadata:
labels:
app: my-nginx-app
spec:
containers:
- name: sfunnel-init
env:
- name: SFUNNEL_RULESET
value: ip tcp dport 80 sport 540 actions unfunnel tcp
image: ghcr.io/datahangar/sfunnel:0.0.3
securityContext:
privileged: true
capabilities:
add: [BPF, NET_ADMIN]
volumeMounts:
- name: bpffs
mountPath: /sys/fs/bpf
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
```
9 changes: 9 additions & 0 deletions docs/next_steps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Next steps

TODO list:

* IPv6 support
* Support for fwmark with mask
* Finalise support for dnat
* Detect packets exceeding MTU (when possible)
* [VPP](https://fd.io/docs/vpp/master) plugin?
Loading

0 comments on commit 447cc84

Please sign in to comment.