Falco exporter gRPC error invalid UTF-8 #619

nc-pnan · 2024-02-08T12:24:02Z

Describe the bug

We are deploying Falco with sidekick and exporter using these helm charts with daemonset, creating falco instances on 3 nodes.
For 2 nodes everything is running without issues, but for the 3rd node, the falco exporter pod keeps failing in CrashLoopBackOff state. Inspecting the log of the falco-exporter container this is the output containing the error message:

2024/02/08 09:27:08 connecting to gRPC server at unix:///run/falco/falco.sock (timeout 2m0s)                                                        
2024/02/08 09:27:08 listening on http://0.0.0.0:9376/metrics                                                                                        
2024/02/08 09:27:08 connected to gRPC server, subscribing events stream                                                                             
2024/02/08 09:27:08 ready                                                                                                                           
2024/02/08 09:27:09 gRPC: rpc error: code = Internal desc = grpc: failed to unmarshal the received message string field contains invalid UTF-8

We get the following error from the Falco pod itself:
[libprotobuf ERROR google/protobuf/wire_format_lite.cc:577] String field 'falco.outputs.response.OutputFieldsEntry.value' contains invalid UTF-8 data when serializing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.

We have tried redeploying the charts several times and it is always the instance connected to one specific node that is failing, but we have not been able to figure out the issue on our end, since all nodes should be configured identically.

How to reproduce it
Deploy the Falco, Falco sidekick and Falco exporter charts with this umbrella Chart and values.yaml configuration, to an AKS cluster running 3 nodes:

Chart.yaml:

annotations:
  category: Analytics
apiVersion: v2
appVersion: 0.37.0
name: falco
description: Falco is a Cloud Native Runtime Security tool designed to detect anomalous activity
dependencies:
  - name: falco
    version: 4.1.0
    repository: "https://falcosecurity.github.io/charts/"
  - name: falcosidekick
    version: 0.7.11
    condition: falcosidekick.enabled
    repository: "https://falcosecurity.github.io/charts/"
  - name: falco-exporter
    version: 0.9.9
    repository: "https://falcosecurity.github.io/charts/"
keywords:
  - monitoring
  - security
  - alerting
sources:
  - https://github.com/falcosecurity/falco
  - https://github.com/falcosecurity/charts
  - https://github.com/falcosecurity/charts/tree/master/falco
  - https://github.com/falcosecurity/charts/tree/master/falcosidekick
  - https://github.com/falcosecurity/charts/tree/master/falco-exporter
version: 0.2.0

Values.yaml:

falcosidekick:
  enabled: true
  config:
    alertmanager:
      hostport: "http://alertmanager-operated.monitoring.svc.cluster.local:9093"
      endpoint: "/api/v1/alerts"
      minimumpriority: "error"
      expireafter: ""
      mutualtls: false
      checkcert: false
      extralabels: "alertname:Falco"

falco:
  driver:
    kind: modern-bpf
    modernEbpf:
      leastPrivileged: true
  podSecurityContext:
    securityContext:
      privileged: true
  podPriorityClassName: priority-class-daemonsets
  resources:
    requests:
      cpu: 100m
      memory: 254Mi
    limits:
      memory: 1024Mi
  falco:
    json_output: true
    http_output:
      enabled: true
      url: "http://falco-falcosidekick:2801/"
    grpc:
      enabled: true
    grpc_output:
      enabled: true

falco-exporter:
  podPriorityClassName: priority-class-daemonsets
  prometheusRules:
    enabled: true
  serviceMonitor:
    enabled: true

Expected behaviour

We expect the falco exporter to be running on all three nodes.

Screenshots
None.

Environment

Falco version:

Falco version: 0.37.0 (x86_64)
Falco initialized with configuration file: /etc/falco/falco.yaml
System info: Linux version 5.15.138.1-4.cm2 (root@CBL-Mariner) (gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37) #1 SMP Thu Nov 30 21:48:10 UTC 2023
{"default_driver_version":"7.0.0+driver","driver_api_version":"8.0.0","driver_schema_version":"2.0.0","engine_version":"31","engine_version_semver":"0.31.0","falco_version":"0.37.0","libs_version":"0.14.2","plugin_api_version":"3.2.0"}

System info:

Falco version: 0.37.0 (x86_64)
Falco initialized with configuration file: /etc/falco/falco.yaml
System info: Linux version 5.15.138.1-4.cm2 (root@CBL-Mariner) (gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37) #1 SMP Thu Nov 30 21:48:10 UTC 2023
Loading rules from file /etc/falco/falco_rules.yaml
{
  "machine": "x86_64",
  "nodename": "falco-dnncw",
  "release": "5.15.138.1-4.cm2",
  "sysname": "Linux",
  "version": "#1 SMP Thu Nov 30 21:48:10 UTC 2023"
}

Cloud provider or hardware configuration:
Azure, AKS
Kubernetes version 1.28.3
OS:

PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian

Kernel:
Linux falco-dnncw 5.15.138.1-4.cm2 #1 SMP Thu Nov 30 21:48:10 UTC 2023 x86_64 GNU/Linux
Installation method:
Kubernetes Helm Install Chart

Additional context
Some additional observations:
If we spin up another (4th) node it appears the same issue will happen on this node as well.
The 2 nodes where the exporter is working happens to be the nodes hosting some instances of prometheus, either the thanos prometheus pod or the prometheus pod. Since we are running thanos prometheus operator chart.
All pods are running in "monitoring" namespace.

The text was updated successfully, but these errors were encountered:

alacuku · 2024-02-08T13:03:17Z

@nc-pnan, does falco log which rule is triggered when failing? Any info on how to reproduce would be helpful.

nc-pnan · 2024-02-08T13:41:59Z

@alacuku I unfortunately don't really have other information on how to reproduce than this, since the cluster it was deployed to is fairly extensive. But if there are any specifics you are interested in, please let me know.

The only triggered rules I can find currently is this one:
{"hostname":"falco-wm9ks","output":"13:09:22.412554500: Notice Unexpected connection to K8s API Server from container (connection=10.244.1.126:45008->10.16.0.1:443 lport=443 rport=45008 fd_type=ipv4 fd_proto=fd.l4proto evt_type=connect user= user_uid=4294967295 user_loginuid=-1 process=<NA> proc_exepath= parent=<NA> command=<NA> terminal=0 container_id= container_image=<NA> container_image_tag=<NA> container_name=<NA> k8s_ns=<NA> k8s_pod_name=<NA>)","priority":"Notice","rule":"Contact K8S API Server From Container","source":"syscall","tags":["T1565","container","k8s","maturity_stable","mitre_discovery","network"],"time":"2024-02-08T13:09:22.412554500Z", "output_fields": {"container.id":"","container.image.repository":null,"container.image.tag":null,"container.name":null,"evt.time":1707397762412554500,"evt.type":"connect","fd.lport":443,"fd.name":"10.244.1.126:45008->10.16.0.1:443","fd.rport":45008,"fd.type":"ipv4","k8s.ns.name":null,"k8s.pod.name":null,"proc.cmdline":"<NA>","proc.exepath":"","proc.name":"<NA>","proc.pname":null,"proc.tty":0,"user.loginuid":-1,"user.name":"","user.uid":4294967295}}

However, we did also get triggers on FalcoExporterAbsent, but this currently is not being triggered for some reason, even though the exporter is in CrashLoopBackoffState.

name: [FalcoExporterAbsent](http://localhost:10902/graph?g0.expr=ALERTS%7Balertname%3D%22FalcoExporterAbsent%22%7D&g0.tab=1&g0.stacked=0&g0.range_input=1h)
expr: [absent(up{job="falco-falco-exporter"})](http://localhost:10902/graph?g0.expr=absent(up%7Bjob%3D%22falco-falco-exporter%22%7D)&g0.tab=1&g0.stacked=0&g0.range_input=1h)
for: 10m
labels:
prometheus: monitoring/prometheus-default-prometheus
prometheus_replica: prometheus-prometheus-default-prometheus-0
severity: critical
annotations:
description: No metrics are being scraped from falco. No events will trigger any alerts.
summary: Falco Exporter has dissapeared from Prometheus service discovery.

poiana · 2024-05-08T15:52:39Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana · 2024-06-07T15:53:31Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

Andreagit97 · 2024-06-07T17:24:30Z

/remove-lifecycle rotten
Hey! This should be fixed in the latest Falco release 0.38.0!
this should be the fix falcosecurity/libs#1800

leogr · 2024-08-28T13:10:41Z

Hey! This should be fixed in the latest Falco release 0.38.0!
this should be the fix falcosecurity/libs#1800

This has been fixed by 0.38 AFAIK. So,
/close

poiana · 2024-08-28T13:10:44Z

@leogr: Closing this issue.

In response to this:

Hey! This should be fixed in the latest Falco release 0.38.0!
this should be the fix falcosecurity/libs#1800

This has been fixed by 0.38 AFAIK. So,
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

nc-pnan added the kind/bug Something isn't working label Feb 8, 2024

poiana added the lifecycle/stale label May 8, 2024

poiana added lifecycle/rotten and removed lifecycle/stale labels Jun 7, 2024

poiana removed the lifecycle/rotten label Jun 7, 2024

poiana closed this as completed Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Falco exporter gRPC error invalid UTF-8 #619

Falco exporter gRPC error invalid UTF-8 #619

nc-pnan commented Feb 8, 2024

alacuku commented Feb 8, 2024

nc-pnan commented Feb 8, 2024

poiana commented May 8, 2024

poiana commented Jun 7, 2024

Andreagit97 commented Jun 7, 2024

leogr commented Aug 28, 2024

poiana commented Aug 28, 2024

Falco exporter gRPC error invalid UTF-8 #619

Falco exporter gRPC error invalid UTF-8 #619

Comments

nc-pnan commented Feb 8, 2024

alacuku commented Feb 8, 2024

nc-pnan commented Feb 8, 2024

poiana commented May 8, 2024

poiana commented Jun 7, 2024

Andreagit97 commented Jun 7, 2024

leogr commented Aug 28, 2024

poiana commented Aug 28, 2024