-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
7 changed files
with
628 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
# sfunnel: multi-flow K8s pod session affinity | ||
|
||
`sfunnel` is an [eBPF](https://ebpf.io/) tool designed to [_funnel_](docs/funneling) | ||
multiple traffic flows through a single [Kubernetes service](https://kubernetes.io/docs/concepts/services-networking/service/) | ||
_port_, ensuring - under [certain conditions](#requirements) - consistent `ClientIP` | ||
affinity across all _ports_ within the service. | ||
|
||
See the original use-case [here](docs/use-cases/network-telemetry-nfacctd.md). | ||
|
||
## At a glance | ||
|
||
Example where `TCP/8080` and `TCP/443` traffic is funneled through `TCP/80`. | ||
|
||
|
||
Remove _ports_ from the K8s service and e.g. deployment. Add the `sfunnel` | ||
container along with the [rules](docs/rules.md) in `SFUNNEL_RULESET`: | ||
|
||
```diff | ||
--- a/service.yaml | ||
+++ b/service.yaml | ||
@@ -1,18 +1,12 @@ | ||
apiVersion: v1 | ||
kind: Service | ||
metadata: | ||
name: my-loadbalancer-service | ||
spec: | ||
type: LoadBalancer | ||
selector: | ||
app: my-nginx-app | ||
ports: | ||
- protocol: TCP | ||
port: 80 | ||
targetPort: 80 | ||
- - protocol: TCP | ||
- port: 8080 | ||
- targetPort: 8080 | ||
- - protocol: TCP | ||
- port: 443 | ||
- targetPort: 443 | ||
sessionAffinity: ClientIP | ||
``` | ||
|
||
```diff | ||
--- a/nginx.yaml | ||
+++ b/nginx.yaml | ||
@@ -1,21 +1,31 @@ | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: my-nginx-deployment | ||
spec: | ||
replicas: 4 | ||
selector: | ||
matchLabels: | ||
app: my-nginx-app | ||
template: | ||
metadata: | ||
labels: | ||
app: my-nginx-app | ||
spec: | ||
containers: | ||
+ - name: sfunnel-init | ||
+ env: | ||
+ - name: SFUNNEL_RULESET | ||
+ value: ip tcp dport 80 sport 540 actions unfunnel tcp | ||
+ image: ghcr.io/datahangar/sfunnel:0.0.3 | ||
+ securityContext: | ||
+ privileged: true | ||
+ capabilities: | ||
+ add: [BPF, NET_ADMIN] | ||
+ volumeMounts: | ||
+ - name: bpffs | ||
+ mountPath: /sys/fs/bpf | ||
- name: nginx | ||
image: nginx:latest | ||
ports: | ||
- containerPort: 80 | ||
- - containerPort: 8080 | ||
- - containerPort: 443 | ||
|
||
``` | ||
|
||
On the other end (e.g. a Linux host, server etc..), deploy it with the | ||
matching [rules](docs/rules.md): | ||
|
||
```shell | ||
SFUNNEL_RULESET="ip daddr <your LB IP1> tcp port 443 actions funnel tcp dport 80 sport 540;\ | ||
ip daddr <your LB IP1> tcp port 8080 actions funnel tcp dport 80 sport 540" | ||
docker run --network="host" --privileged -e SFUNNEL_RULESET="$SFUNNEL_RULESET" sfunnel | ||
``` | ||
|
||
The `sfunnel` container will run and load the eBPF code. | ||
|
||
##### More use-cases | ||
|
||
This is a simple example yet not very useful example. See [use-cases](docs/use-cases/) | ||
for real world examples. | ||
|
||
## Requirements | ||
|
||
* In Kubernetes: | ||
* Permissions to spawn containers with `BPF` and `NET_ADMIN` capabilities. | ||
* [eBPF](https://ebpf.io/)-enabled kernel, with support for `clsact` and `direct-action`. | ||
* Proper [MTU configuration](docs/funneling.md#mtu) (20 bytes for TCP, 8 for UDP). | ||
* On the funneling side: | ||
* Permissions to spawn `sfunnel`. | ||
* Route or proxy traffic to be funneled. More on this [here](docs/funneling.md) | ||
|
||
Make sure stateful firewalls and IDS/IDPS are properly configured to allow this | ||
type of traffic. | ||
|
||
## More... | ||
|
||
* [Use-cases](docs/use-cases/) | ||
* [Funneling?](docs/funneling.md) | ||
* [Rule syntax](docs/rules.md) | ||
* [sfunnel container](docs/container.md) and how to deploy in K8s | ||
* [Deploying it in K8s](docs/k8s.md) | ||
* [Next steps](docs/next_steps.md) | ||
|
||
Contact | ||
------- | ||
|
||
Marc Sune < marcdevel (at) gmail (dot) com> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# `sfunnel` container | ||
|
||
The `sfunnel` container is meant to run as an initContainer() or as an ephemeral | ||
container (in `docker --network="host"`). | ||
|
||
Upon starting, it will: | ||
|
||
1. Recompile the BPF program if a custom ruleset is provided. Ruleset is static | ||
at compile-time, so no maps are needed. Mind the [ruleset limits](rules.md#scalability). | ||
1. For each interface in `$IFACES`: | ||
* it creates a `clasct` qdisc | ||
* it attached the BPF program to it | ||
|
||
## Environment variables | ||
|
||
Some ENV variables control the behaviour of the container: | ||
|
||
* `$SFUNNEL_RULESET`: list of rules. This variable has precedence over `/opt/sfunnel/src/ruleset`. | ||
* `$IFACES`: interfaces to load the BPF program. Default: "" (all). | ||
* `$N_ATTEMPTS`: number of attempts on loading the BPF program on an interface. Default 6. | ||
* `$RETRY_DELAY`: delay between attemps. Default: 3. | ||
|
||
## Loading Ruleset via file | ||
|
||
The ruleset can be loaded via configmap/docker volume by creating the file `ruleset` | ||
in `/opt/sfunnel/src`. This file has precedence over `/opt/sfunnel/src/ruleset.defaults`. | ||
|
||
## Life-cycle and garbage collection | ||
|
||
XXX |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
# _Funneling? Isn't it just tunneling_ | ||
|
||
`sfunnel` pushes a new L4 header (TCP or UDP) between the IP and the existing L4 | ||
header. It is a form of pseudo-tunneling, and suffers from the same | ||
[MTU issues](#mtu) as a any tunnel. | ||
|
||
Tunnels usually have a dedicated L4 proto+port, and _only_ tunneled traffic is | ||
received on that port. This is not the case when _funneling_, as funneled | ||
traffic will flow alongside with the real traffic, hence the reason to use a | ||
different term to avoid confusion. | ||
|
||
For example, when funneling some UDP traffic on top TCP port 80, _some_ traffic | ||
flowing will still be WEB traffic, and will be left untouched, while UDP | ||
traffic on top will be unfunneled (decapped or demultiplexed) and delivered as | ||
UDP traffic transparently. | ||
|
||
## The life of a packet | ||
|
||
### Funneling | ||
|
||
Using [`scapy`]() syntax, with a funneling rule like this: | ||
|
||
``` | ||
udp dport 4739 actions funnel tcp dport 179 sport 540 | ||
``` | ||
|
||
A(n IPFIX) packet: | ||
|
||
```python | ||
Ether()/IP()/UDP(dport=4739)/IPFIX()/... | ||
``` | ||
|
||
would be convereted into: | ||
|
||
```python | ||
Ether()/IP()/TCP(dport=179, sport=540)/UDP(dport=4739)/IPFIX()/... | ||
``` | ||
|
||
> :pencil: Note | ||
> | ||
> For the record, other TCP fields are currently hardcoded to: | ||
> * `flags`: SYN | ||
> * `seq`: `0xCAFEBABE` | ||
> * `ack_seq`: `0xBABECAFE` | ||
> * `window`: `1024` | ||
> * `urg_ptr`: `0x0` | ||
> | ||
> `funnel` action could be extended to set some of these values (flags in particular) | ||
### Unfunneling; reversing it! | ||
|
||
On the other end, typically a K8s pod, a rule like this would exist: | ||
|
||
``` | ||
tcp dport 179 sport 540 actions unfunnel udp | ||
``` | ||
|
||
Therefore, the traffic received by the worker node: | ||
|
||
```python | ||
Ether()/IP()/TCP(dport=179, sport=540)/UDP(dport=4739)/IPFIX()/... | ||
``` | ||
|
||
Would be converted back to: | ||
|
||
```python | ||
Ether()/IP()/UDP(dport=4739)/IPFIX()/... | ||
``` | ||
|
||
## MTU | ||
|
||
Funneling suffers from the same problems as any encapsulation (tunneling). The | ||
MTU should be sufficiently big to accomodate the extra 20 bytes for TCP funneling | ||
or 8 bytes for UDP funneling. | ||
|
||
Make sure you adjust this. An [upcoming feature](next_steps.md) will be to check | ||
for MTU exceeding funneled packets and raise alarms (`printk()`). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
# Deploying `sfunnel` in K8s | ||
|
||
Deploying `sfunnel` as an `initContainer` is straight forward (see [1]), | ||
provided that you have the [right privileges](#capabilities). | ||
|
||
For `sfunnel` to work, Services must - obviously - be defined with | ||
`sessionAffinity: ClientIP` in first place. `sfunnel` will attach the eBPF | ||
program to the Pod's `$IFACES`. | ||
|
||
> :pencil: **Note** | ||
> | ||
> Make sure to adjust the [MTU](funneling.md#mtu) | ||
## Services | ||
|
||
### `LoadBalancer` | ||
|
||
Traffic must hit the LB funneled. Therefore, traffic must have been either | ||
generated or routed through a node running `sfunnel` with funneling rules. | ||
|
||
`LoadBalancer` services honouring `sessionAffinity: ClientIP` will send traffic | ||
from the tuple {`srcIP`, `protocol`, `srcPort`, `DstPort`} to the same Worker | ||
Node. | ||
|
||
In turn, CNIs supporting `sessionAffinity: ClientIP` will send traffic for the | ||
tuple {`srcIP`, `protocol`, `srcPort`, `DstPort`} to the same Pod (until rescheduled). | ||
Traffic entering the Pod Network Namespace will be unfunnel/demultiplexed before | ||
being terminated by the Kernel, and delivered to sockets. | ||
|
||
### `NodePort` | ||
|
||
Similarly, traffic needs to hit the Worker Node funneled. You could theoretically | ||
run funneling rules _before_ the CNI does its magic, but this is tricky and it's | ||
NOT recommended. | ||
|
||
It goes without saying that traffic needs to hit the _right_ `NodePort` for the same | ||
{`srcIP`, `protocol`, `srcPort`, `DstPort`}, otherwise `sessionAffinity: ClientIP` | ||
wouldn't work (even for a single port) in first place. | ||
|
||
The process is then the exact same as with the `LoadBalancer` service. | ||
|
||
### `ClusterIP` | ||
|
||
> :warning: **Warning** | ||
> | ||
> This hasn't been tested, so take it as a plausible conjecture. | ||
This is an interesting one, and not anticipated, as the | ||
[original use-case](docs/use-cases/network-telemetry-nfacctd.md) only used | ||
`LoadBalancer` services. | ||
|
||
You can funnel multiple `ClusterIP` services - with multiple ports - into a | ||
single protocol+port, provided that are backed by the same Pod. This effectively | ||
makes all flows from a consumer Pod A talk to the same backend Pod B | ||
until there is a rescheduling. | ||
|
||
An example: | ||
|
||
Pod A (consumer) ruleset: | ||
``` | ||
ip daddr <ClusterIP_1> tcp dport 443 funnel tcp dport 80 sport 540 # HTTPs | ||
ip daddr <ClusterIP_2> tcp dport 8080 funnel tcp dport 80 sport 540 # Proxy HTTP | ||
ip daddr <ClusterIP_3> udp dport 443 funnel udp dport 80 sport 541 # QUIC | ||
``` | ||
|
||
Pod B (backend) ruleset: | ||
``` | ||
tcp dport 80 sport 540 unfunnel tcp | ||
tcp dport 80 sport 541 unfunnel udp | ||
``` | ||
|
||
## Supported CNIs | ||
|
||
In principle, any CNI and LB honouring `sessionAffinity: ClientIP` should work | ||
out of the box. | ||
|
||
`sfunnel` has been tested with Cilium v1.15 and v1.16. | ||
|
||
## Security considerations | ||
|
||
### Capabilities: `CAP_BPF`, `CAP_NET_ADMIN` | ||
|
||
`sfunnel` requires elevated privileges to run and load BPF TC programs. | ||
|
||
### Digest | ||
|
||
> :heavy_exclamation_mark: **Important** | ||
> | ||
> ALWAYS check `sfunnel`'s image `sha256` when running in production. | ||
E.g.: | ||
``` | ||
image: ghcr.io/datahangar/sfunnel:0.0.3@sha256:f4f72e64a93f7543e33000d01807fb66257cc88165b580763726aa4a01302655 | ||
``` | ||
|
||
--- | ||
|
||
##### [1] Example | ||
|
||
`lb-service.yaml`: | ||
|
||
```yaml | ||
@@ -1,18 +1,12 @@ | ||
apiVersion: v1 | ||
kind: Service | ||
metadata: | ||
name: my-loadbalancer-service | ||
spec: | ||
type: LoadBalancer | ||
selector: | ||
app: my-nginx-app | ||
ports: | ||
- protocol: TCP | ||
port: 80 | ||
targetPort: 80 | ||
sessionAffinity: ClientIP | ||
``` | ||
`nginx.yaml`: | ||
|
||
```yaml | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: my-nginx-deployment | ||
spec: | ||
replicas: 4 | ||
selector: | ||
matchLabels: | ||
app: my-nginx-app | ||
template: | ||
metadata: | ||
labels: | ||
app: my-nginx-app | ||
spec: | ||
containers: | ||
- name: sfunnel-init | ||
env: | ||
- name: SFUNNEL_RULESET | ||
value: ip tcp dport 80 sport 540 actions unfunnel tcp | ||
image: ghcr.io/datahangar/sfunnel:0.0.3 | ||
securityContext: | ||
privileged: true | ||
capabilities: | ||
add: [BPF, NET_ADMIN] | ||
volumeMounts: | ||
- name: bpffs | ||
mountPath: /sys/fs/bpf | ||
- name: nginx | ||
image: nginx:latest | ||
ports: | ||
- containerPort: 80 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Next steps | ||
|
||
TODO list: | ||
|
||
* IPv6 support | ||
* Support for fwmark with mask | ||
* Finalise support for dnat | ||
* Detect packets exceeding MTU (when possible) | ||
* [VPP](https://fd.io/docs/vpp/master) plugin? |
Oops, something went wrong.