Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve e2e troubleshooting #448

Merged
merged 3 commits into from
Dec 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -144,12 +144,12 @@ docker-generate: ## Create the container that generates the eBPF binaries
.PHONY: compile
compile: ## Compile ebpf agent project
@echo "### Compiling project"
GOARCH=${GOARCH} GOOS=$(GOOS) go build -mod vendor -a -o bin/netobserv-ebpf-agent cmd/netobserv-ebpf-agent.go
GOARCH=${GOARCH} GOOS=$(GOOS) go build -mod vendor -o bin/netobserv-ebpf-agent cmd/netobserv-ebpf-agent.go

.PHONY: test
test: ## Test code using go test
@echo "### Testing code"
GOOS=$(GOOS) go test -mod vendor -a ./... -coverpkg=./... -coverprofile cover.all.out
GOOS=$(GOOS) go test -mod vendor ./pkg/... ./cmd/... -coverpkg=./... -coverprofile cover.all.out

.PHONY: cov-exclude-generated
cov-exclude-generated:
Expand All @@ -171,7 +171,8 @@ tests-e2e: prereqs ## Run e2e tests
go clean -testcache
# making the local agent image available to kind in two ways, so it will work in different
# environments: (1) as image tagged in the local repository (2) as image archive.
$(OCI_BIN) build . --build-arg TARGETARCH=$(GOARCH) -t localhost/ebpf-agent:test
rm -f ebpf-agent.tar || true
$(OCI_BIN) build . --build-arg LDFLAGS="" --build-arg TARGETARCH=$(GOARCH) -t localhost/ebpf-agent:test
$(OCI_BIN) save -o ebpf-agent.tar localhost/ebpf-agent:test
GOOS=$(GOOS) go test -p 1 -timeout 30m -v -mod vendor -tags e2e ./e2e/...

Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,10 @@ make generate

Regularly tested on Fedora.

### Running end-to-end tests

Refer to the specific documentation: [e2e readme](./e2e/README.md)

## Known issues

### Extrenal Traffic in Openshift (OVN-Kubernetes CNI)
Expand Down
66 changes: 66 additions & 0 deletions e2e/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
## eBPF Agent e2e tests

e2e tests can be run with:

```bash
make tests-e2e
```

If you use podman, you may need to run it as root instead:

```bash
sudo make tests-e2e
```

### What it does

It builds an image with the current code, including pre-generated BPF bytecode, starts a KIND cluster and deploys the agent on it. It also deploys a typical NetObserv stack, that includes flowlogs-pipeline, Loki and/or Kafka.

It then runs a couple of smoke tests on that cluster, such as testing sending pings between pods and verifying that the expected flows are created.

The tests leverage Kube's [e2e-framework](https://github.com/kubernetes-sigs/e2e-framework). They are based on manifest files that you can find in [this directory](./cluster/base/).

### How to troubleshoot

During the tests, you can run any `kubectl` command to the KIND cluster.

If you use podman/root and don't want to open a root session you can simply copy the root kube config:

```bash
sudo cp /root/.kube/config /tmp/agent-kind-kubeconfig
sudo -E chown $USER:$USER /tmp/agent-kind-kubeconfig
export KUBECONFIG=/tmp/agent-kind-kubeconfig
```

Then:

```bash
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
flp-29bmd 1/1 Running 0 6s
loki-7c98dfd6d4-c8q9m 1/1 Running 0 56s
```

### Cleanup

The KIND cluster should be cleaned up after tests. Sometimes it won't, like with forced exit or for some kinds of failures.
When that's the case, you should see a message telling you to manually cleanup the cluster:

```
^CSIGTERM received, cluster might still be running
To clean up, run: kind delete cluster --name basic-test-cluster20241212-125815
FAIL github.com/netobserv/netobserv-ebpf-agent/e2e/basic 172.852s
```

If that's not the case, you can manually retrieve the cluster name to delete:

```bash
$ kind get clusters
basic-test-cluster20241212-125815

$ kind delete cluster --name=basic-test-cluster20241212-125815
Deleting cluster "basic-test-cluster20241212-125815" ...
Deleted nodes: ["basic-test-cluster20241212-125815-control-plane"]
```

If not cleaned up, a subsequent run of e2e tests will fail due to addresses (ports) already in use.
11 changes: 5 additions & 6 deletions e2e/basic/common.go
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
//go:build e2e

package basic

import (
Expand Down Expand Up @@ -37,7 +35,7 @@ func (bt *FlowCaptureTester) DoTest(t *testing.T, isIPFIX bool) {
return ctx
},
).Assess("correctness of client -> server (as Service) request flows",
func(ctx context.Context, t *testing.T, cfg *envconf.Config) context.Context {
func(ctx context.Context, t *testing.T, _ *envconf.Config) context.Context {
lq := bt.lokiQuery(t,
`{DstK8S_OwnerName="server",SrcK8S_OwnerName="client"}`+
`|="\"DstAddr\":\"`+pci.serverServiceIP+`\""`)
Expand Down Expand Up @@ -82,7 +80,7 @@ func (bt *FlowCaptureTester) DoTest(t *testing.T, isIPFIX bool) {
return ctx
},
).Assess("correctness of client -> server (as Pod) request flows",
func(ctx context.Context, t *testing.T, cfg *envconf.Config) context.Context {
func(ctx context.Context, t *testing.T, _ *envconf.Config) context.Context {
lq := bt.lokiQuery(t,
`{DstK8S_OwnerName="server",SrcK8S_OwnerName="client"}`+
`|="\"DstAddr\":\"`+pci.serverPodIP+`\""`)
Expand Down Expand Up @@ -124,7 +122,7 @@ func (bt *FlowCaptureTester) DoTest(t *testing.T, isIPFIX bool) {
return ctx
},
).Assess("correctness of server (from Service) -> client response flows",
func(ctx context.Context, t *testing.T, cfg *envconf.Config) context.Context {
func(ctx context.Context, t *testing.T, _ *envconf.Config) context.Context {
lq := bt.lokiQuery(t,
`{DstK8S_OwnerName="client",SrcK8S_OwnerName="server"}`+
`|="\"SrcAddr\":\"`+pci.serverServiceIP+`\""`)
Expand Down Expand Up @@ -167,7 +165,7 @@ func (bt *FlowCaptureTester) DoTest(t *testing.T, isIPFIX bool) {
return ctx
},
).Assess("correctness of server (from Pod) -> client response flows",
func(ctx context.Context, t *testing.T, cfg *envconf.Config) context.Context {
func(ctx context.Context, t *testing.T, _ *envconf.Config) context.Context {
lq := bt.lokiQuery(t,
`{DstK8S_OwnerName="client",SrcK8S_OwnerName="server"}`+
`|="\"SrcAddr\":\"`+pci.serverPodIP+`\""`)
Expand Down Expand Up @@ -282,6 +280,7 @@ func (bt *FlowCaptureTester) lokiQuery(t *testing.T, logQL string) tester.LokiQu
query, err = bt.Cluster.Loki().Query(1, logQL)
require.NoError(t, err)
require.NotNil(t, query)
require.NotNil(t, query.Data)
require.NotEmpty(t, query.Data.Result)
}, test.Interval(time.Second))
result := query.Data.Result[0]
Expand Down
3 changes: 1 addition & 2 deletions e2e/basic/flow_test.go
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
//go:build e2e

package basic

import (
Expand Down Expand Up @@ -152,6 +150,7 @@ func getPingFlows(t *testing.T, newerThan time.Time, expectedBytes int) (sent, r
}, test.Interval(time.Second))

test.Eventually(t, time.Minute, func(t require.TestingT) {
// testCluster.Loki().DebugPrint(100, `{app="netobserv-flowcollector",DstK8S_OwnerName="pinger"}`)
query, err = testCluster.Loki().
Query(1, fmt.Sprintf(`{SrcK8S_OwnerName="server",DstK8S_OwnerName="pinger"}`+
`|~"\"Proto\":1[,}]"`+ // Proto 1 == ICMP
Expand Down
74 changes: 63 additions & 11 deletions e2e/cluster/base/02-loki.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,11 @@ data:
server:
http_listen_port: 3100
grpc_listen_port: 9096
grpc_server_max_recv_msg_size: 10485760
http_server_read_timeout: 1m
http_server_write_timeout: 1m
log_level: error
target: all
common:
path_prefix: /loki-store
storage:
Expand All @@ -31,9 +36,32 @@ data:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
compactor:
compaction_interval: 5m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
frontend:
compress_responses: true
ingester:
chunk_encoding: snappy
chunk_retain_period: 1m
query_range:
align_queries_with_step: true
cache_results: true
max_retries: 5
results_cache:
cache:
enable_fifocache: true
fifocache:
max_size_bytes: 500MB
validity: 24h
parallelise_shardable_queries: true
query_scheduler:
max_outstanding_requests_per_tenant: 2048
schema_config:
configs:
- from: 2020-10-24
- from: 2022-01-01
store: boltdb-shipper
object_store: filesystem
schema: v11
Expand All @@ -47,15 +75,39 @@ data:
active_index_directory: /loki-store/index
shared_store: filesystem
cache_location: /loki-store/boltdb-cache
datasource.yaml: |
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
url: http://localhost:3100
isDefault: true
version: 1
cache_ttl: 24h
limits_config:
ingestion_rate_strategy: global
ingestion_rate_mb: 10
ingestion_burst_size_mb: 10
max_label_name_length: 1024
max_label_value_length: 2048
max_label_names_per_series: 30
reject_old_samples: true
reject_old_samples_max_age: 15m
creation_grace_period: 10m
enforce_metric_name: false
max_line_size: 256000
max_line_size_truncate: false
max_entries_limit_per_query: 10000
max_streams_per_user: 0
max_global_streams_per_user: 0
unordered_writes: true
max_chunks_per_query: 2000000
max_query_length: 721h
max_query_parallelism: 32
max_query_series: 10000
cardinality_limit: 100000
max_streams_matchers_per_query: 1000
max_concurrent_tail_requests: 10
retention_period: 24h
max_cache_freshness_per_query: 5m
max_queriers_per_tenant: 0
per_stream_rate_limit: 3MB
per_stream_rate_limit_burst: 15MB
max_query_lookback: 0
min_sharding_lookback: 0s
split_queries_by_interval: 1m
---
apiVersion: apps/v1
kind: Deployment
Expand Down Expand Up @@ -83,7 +135,7 @@ spec:
name: loki-config
containers:
- name: loki
image: grafana/loki:2.4.1
image: grafana/loki:2.9.0
volumeMounts:
- mountPath: "/loki-store"
name: loki-store
Expand Down
Loading
Loading