Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Falco exporter gRPC error invalid UTF-8 #619

Closed
nc-pnan opened this issue Feb 8, 2024 · 7 comments
Closed

Falco exporter gRPC error invalid UTF-8 #619

nc-pnan opened this issue Feb 8, 2024 · 7 comments
Labels
kind/bug Something isn't working

Comments

@nc-pnan
Copy link

nc-pnan commented Feb 8, 2024

Describe the bug

We are deploying Falco with sidekick and exporter using these helm charts with daemonset, creating falco instances on 3 nodes.
For 2 nodes everything is running without issues, but for the 3rd node, the falco exporter pod keeps failing in CrashLoopBackOff state. Inspecting the log of the falco-exporter container this is the output containing the error message:

2024/02/08 09:27:08 connecting to gRPC server at unix:///run/falco/falco.sock (timeout 2m0s)                                                        
2024/02/08 09:27:08 listening on http://0.0.0.0:9376/metrics                                                                                        
2024/02/08 09:27:08 connected to gRPC server, subscribing events stream                                                                             
2024/02/08 09:27:08 ready                                                                                                                           
2024/02/08 09:27:09 gRPC: rpc error: code = Internal desc = grpc: failed to unmarshal the received message string field contains invalid UTF-8 

We get the following error from the Falco pod itself:
[libprotobuf ERROR google/protobuf/wire_format_lite.cc:577] String field 'falco.outputs.response.OutputFieldsEntry.value' contains invalid UTF-8 data when serializing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.

We have tried redeploying the charts several times and it is always the instance connected to one specific node that is failing, but we have not been able to figure out the issue on our end, since all nodes should be configured identically.

How to reproduce it
Deploy the Falco, Falco sidekick and Falco exporter charts with this umbrella Chart and values.yaml configuration, to an AKS cluster running 3 nodes:

Chart.yaml:

annotations:
  category: Analytics
apiVersion: v2
appVersion: 0.37.0
name: falco
description: Falco is a Cloud Native Runtime Security tool designed to detect anomalous activity
dependencies:
  - name: falco
    version: 4.1.0
    repository: "https://falcosecurity.github.io/charts/"
  - name: falcosidekick
    version: 0.7.11
    condition: falcosidekick.enabled
    repository: "https://falcosecurity.github.io/charts/"
  - name: falco-exporter
    version: 0.9.9
    repository: "https://falcosecurity.github.io/charts/"
keywords:
  - monitoring
  - security
  - alerting
sources:
  - https://github.com/falcosecurity/falco
  - https://github.com/falcosecurity/charts
  - https://github.com/falcosecurity/charts/tree/master/falco
  - https://github.com/falcosecurity/charts/tree/master/falcosidekick
  - https://github.com/falcosecurity/charts/tree/master/falco-exporter
version: 0.2.0

Values.yaml:

falcosidekick:
  enabled: true
  config:
    alertmanager:
      hostport: "http://alertmanager-operated.monitoring.svc.cluster.local:9093"
      endpoint: "/api/v1/alerts"
      minimumpriority: "error"
      expireafter: ""
      mutualtls: false
      checkcert: false
      extralabels: "alertname:Falco"

falco:
  driver:
    kind: modern-bpf
    modernEbpf:
      leastPrivileged: true
  podSecurityContext:
    securityContext:
      privileged: true
  podPriorityClassName: priority-class-daemonsets
  resources:
    requests:
      cpu: 100m
      memory: 254Mi
    limits:
      memory: 1024Mi
  falco:
    json_output: true
    http_output:
      enabled: true
      url: "http://falco-falcosidekick:2801/"
    grpc:
      enabled: true
    grpc_output:
      enabled: true

falco-exporter:
  podPriorityClassName: priority-class-daemonsets
  prometheusRules:
    enabled: true
  serviceMonitor:
    enabled: true

Expected behaviour

We expect the falco exporter to be running on all three nodes.

Screenshots
None.

Environment

  • Falco version:
Falco version: 0.37.0 (x86_64)
Falco initialized with configuration file: /etc/falco/falco.yaml
System info: Linux version 5.15.138.1-4.cm2 (root@CBL-Mariner) (gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37) #1 SMP Thu Nov 30 21:48:10 UTC 2023
{"default_driver_version":"7.0.0+driver","driver_api_version":"8.0.0","driver_schema_version":"2.0.0","engine_version":"31","engine_version_semver":"0.31.0","falco_version":"0.37.0","libs_version":"0.14.2","plugin_api_version":"3.2.0"}

  • System info:
Falco version: 0.37.0 (x86_64)
Falco initialized with configuration file: /etc/falco/falco.yaml
System info: Linux version 5.15.138.1-4.cm2 (root@CBL-Mariner) (gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37) #1 SMP Thu Nov 30 21:48:10 UTC 2023
Loading rules from file /etc/falco/falco_rules.yaml
{
  "machine": "x86_64",
  "nodename": "falco-dnncw",
  "release": "5.15.138.1-4.cm2",
  "sysname": "Linux",
  "version": "#1 SMP Thu Nov 30 21:48:10 UTC 2023"
}
  • Cloud provider or hardware configuration:
    Azure, AKS
    Kubernetes version 1.28.3

  • OS:

PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
  • Kernel:
    Linux falco-dnncw 5.15.138.1-4.cm2 #1 SMP Thu Nov 30 21:48:10 UTC 2023 x86_64 GNU/Linux
  • Installation method:
    Kubernetes Helm Install Chart

Additional context
Some additional observations:
If we spin up another (4th) node it appears the same issue will happen on this node as well.
The 2 nodes where the exporter is working happens to be the nodes hosting some instances of prometheus, either the thanos prometheus pod or the prometheus pod. Since we are running thanos prometheus operator chart.
All pods are running in "monitoring" namespace.

@nc-pnan nc-pnan added the kind/bug Something isn't working label Feb 8, 2024
@alacuku
Copy link
Member

alacuku commented Feb 8, 2024

@nc-pnan, does falco log which rule is triggered when failing? Any info on how to reproduce would be helpful.

@nc-pnan
Copy link
Author

nc-pnan commented Feb 8, 2024

@alacuku I unfortunately don't really have other information on how to reproduce than this, since the cluster it was deployed to is fairly extensive. But if there are any specifics you are interested in, please let me know.

The only triggered rules I can find currently is this one:
{"hostname":"falco-wm9ks","output":"13:09:22.412554500: Notice Unexpected connection to K8s API Server from container (connection=10.244.1.126:45008->10.16.0.1:443 lport=443 rport=45008 fd_type=ipv4 fd_proto=fd.l4proto evt_type=connect user= user_uid=4294967295 user_loginuid=-1 process=<NA> proc_exepath= parent=<NA> command=<NA> terminal=0 container_id= container_image=<NA> container_image_tag=<NA> container_name=<NA> k8s_ns=<NA> k8s_pod_name=<NA>)","priority":"Notice","rule":"Contact K8S API Server From Container","source":"syscall","tags":["T1565","container","k8s","maturity_stable","mitre_discovery","network"],"time":"2024-02-08T13:09:22.412554500Z", "output_fields": {"container.id":"","container.image.repository":null,"container.image.tag":null,"container.name":null,"evt.time":1707397762412554500,"evt.type":"connect","fd.lport":443,"fd.name":"10.244.1.126:45008->10.16.0.1:443","fd.rport":45008,"fd.type":"ipv4","k8s.ns.name":null,"k8s.pod.name":null,"proc.cmdline":"<NA>","proc.exepath":"","proc.name":"<NA>","proc.pname":null,"proc.tty":0,"user.loginuid":-1,"user.name":"","user.uid":4294967295}}

However, we did also get triggers on FalcoExporterAbsent, but this currently is not being triggered for some reason, even though the exporter is in CrashLoopBackoffState.

name: [FalcoExporterAbsent](http://localhost:10902/graph?g0.expr=ALERTS%7Balertname%3D%22FalcoExporterAbsent%22%7D&g0.tab=1&g0.stacked=0&g0.range_input=1h)
expr: [absent(up{job="falco-falco-exporter"})](http://localhost:10902/graph?g0.expr=absent(up%7Bjob%3D%22falco-falco-exporter%22%7D)&g0.tab=1&g0.stacked=0&g0.range_input=1h)
for: 10m
labels:
prometheus: monitoring/prometheus-default-prometheus
prometheus_replica: prometheus-prometheus-default-prometheus-0
severity: critical
annotations:
description: No metrics are being scraped from falco. No events will trigger any alerts.
summary: Falco Exporter has dissapeared from Prometheus service discovery.

@poiana
Copy link
Contributor

poiana commented May 8, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@poiana
Copy link
Contributor

poiana commented Jun 7, 2024

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

@Andreagit97
Copy link
Member

/remove-lifecycle rotten
Hey! This should be fixed in the latest Falco release 0.38.0!
this should be the fix falcosecurity/libs#1800

@leogr
Copy link
Member

leogr commented Aug 28, 2024

Hey! This should be fixed in the latest Falco release 0.38.0!
this should be the fix falcosecurity/libs#1800

This has been fixed by 0.38 AFAIK. So,
/close

@poiana
Copy link
Contributor

poiana commented Aug 28, 2024

@leogr: Closing this issue.

In response to this:

Hey! This should be fixed in the latest Falco release 0.38.0!
this should be the fix falcosecurity/libs#1800

This has been fixed by 0.38 AFAIK. So,
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@poiana poiana closed this as completed Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants