Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pyroscope Java profiling not working after following documentation (missing Linux capabilities) #1616

Open
caspar-ds opened this issue Sep 4, 2024 · 4 comments · May be fixed by #1788
Open
Assignees
Labels
bug Something isn't working type/docs Docs Squad label across all Grafana Labs repos

Comments

@caspar-ds
Copy link

caspar-ds commented Sep 4, 2024

What's wrong?

After following the documentation here, profiling of Java processes results in the following errors for all processes:

{"ts":"2024-09-04T20:14:23.991554124Z","level":"error","msg":"failed to get process info","component_path":"/","component_id":"discovery.process.all","err":"readlink /proc/1/exe: permission denied","pid":1}
{"ts":"2024-09-04T20:14:23.991644175Z","level":"error","msg":"failed to get process info","component_path":"/","component_id":"discovery.process.all","err":"readlink /proc/2/exe: permission denied","pid":2}
{"ts":"2024-09-04T20:14:23.991677615Z","level":"error","msg":"failed to get process info","component_path":"/","component_id":"discovery.process.all","err":"readlink /proc/3/exe: permission denied","pid":3}
{"ts":"2024-09-04T20:14:23.991708775Z","level":"error","msg":"failed to get process info","component_path":"/","component_id":"discovery.process.all","err":"readlink /proc/4/exe: permission denied","pid":4}
{"ts":"2024-09-04T20:14:23.991749485Z","level":"error","msg":"failed to get process info","component_path":"/","component_id":"discovery.process.all","err":"readlink /proc/6/exe: permission denied","pid":6}
{"ts":"2024-09-04T20:14:23.991779286Z","level":"error","msg":"failed to get process info","component_path":"/","component_id":"discovery.process.all","err":"readlink /proc/8/exe: permission denied","pid":8}
{"ts":"2024-09-04T20:14:23.991807106Z","level":"error","msg":"failed to get process info","component_path":"/","component_id":"discovery.process.all","err":"readlink /proc/9/exe: permission denied","pid":9}
...

Helm values:

alloy:
  configMap:
    create: false
    name: alloy-config
    key: config.alloy
  stabilityLevel: "generally-available"
  enableReporting: false
  securityContext:
    runAsUser: 0

controller:
  type: daemonset
  hostPID: true

Alloy config:

logging {
	level  = "info"
	format = "json"
}

discovery.kubernetes "local_pods" {
  selectors {
    field = "spec.nodeName=" + env("HOSTNAME")
    role = "pod"
  }
  role = "pod"
}


discovery.relabel "java_pods" {
  targets = discovery.kubernetes.local_pods.targets
  // Filter only java processes
  rule {
    source_labels = ["__meta_process_exe"]
    action = "keep"
    regex = ".*/java$"
  }
  rule {
    action = "drop"
    regex = "Succeeded|Failed|Completed"
    source_labels = ["__meta_kubernetes_pod_phase"]
  }
  rule {
    action = "replace"
    source_labels = ["__meta_kubernetes_namespace"]
    target_label = "namespace"
  }
  rule {
    action = "replace"
    source_labels = ["__meta_kubernetes_pod_name"]
    target_label = "pod"
  }
  rule {
    action = "replace"
    source_labels = ["__meta_kubernetes_pod_node_name"]
    target_label = "node"
  }
  rule {
    action = "replace"
    source_labels = ["__meta_kubernetes_pod_container_name"]
    target_label = "container"
  }
  // Provide arbitrary service_name label, otherwise it will be inferred from discovery labels automatically
  rule {
    action = "replace"
    regex = "(.*)@(.*)"
    replacement = "java/${1}/${2}"
    separator = "@"
    source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_pod_container_name"]
    target_label = "service_name"
  }
}

pyroscope.java "java" {
  forward_to = [pyroscope.write.pyroscope_write.receiver]
  targets = discovery.relabel.java_pods.output
}

pyroscope.write "pyroscope_write" {
	endpoint {
		url = "http://pyroscope.pyroscope.svc.cluster.local:4040"
	}
}

Steps to reproduce

Install Alloy in a Kubernetes cluster using the above values and configuration

System information

Linux version 5.10.223-212.873.amzn2.x86_64

Software version

Grafana Alloy v1.3.1

Configuration

logging {
	level  = "info"
	format = "json"
}

discovery.kubernetes "local_pods" {
  selectors {
    field = "spec.nodeName=" + env("HOSTNAME")
    role = "pod"
  }
  role = "pod"
}


discovery.relabel "java_pods" {
  targets = discovery.kubernetes.local_pods.targets
  // Filter only java processes
  rule {
    source_labels = ["__meta_process_exe"]
    action = "keep"
    regex = ".*/java$"
  }
  rule {
    action = "drop"
    regex = "Succeeded|Failed|Completed"
    source_labels = ["__meta_kubernetes_pod_phase"]
  }
  rule {
    action = "replace"
    source_labels = ["__meta_kubernetes_namespace"]
    target_label = "namespace"
  }
  rule {
    action = "replace"
    source_labels = ["__meta_kubernetes_pod_name"]
    target_label = "pod"
  }
  rule {
    action = "replace"
    source_labels = ["__meta_kubernetes_pod_node_name"]
    target_label = "node"
  }
  rule {
    action = "replace"
    source_labels = ["__meta_kubernetes_pod_container_name"]
    target_label = "container"
  }
  // Provide arbitrary service_name label, otherwise it will be inferred from discovery labels automatically
  rule {
    action = "replace"
    regex = "(.*)@(.*)"
    replacement = "java/${1}/${2}"
    separator = "@"
    source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_pod_container_name"]
    target_label = "service_name"
  }
}

pyroscope.java "java" {
  forward_to = [pyroscope.write.pyroscope_write.receiver]
  targets = discovery.relabel.java_pods.output
}

pyroscope.write "pyroscope_write" {
	endpoint {
		url = "http://pyroscope.pyroscope.svc.cluster.local:4040"
	}
}

Logs

{"ts":"2024-09-04T20:14:23.783384097Z","level":"info","boringcrypto enabled":false}
{"ts":"2024-09-04T20:14:23.783438837Z","level":"info","msg":"starting complete graph evaluation","controller_path":"/","controller_id":"","trace_id":"c1709a86c9849a4bc42d9586b31a673f"}
{"ts":"2024-09-04T20:14:23.783469298Z","level":"info","msg":"finished node evaluation","controller_path":"/","controller_id":"","trace_id":"c1709a86c9849a4bc42d9586b31a673f","node_id":"logging","duration":119411}
{"ts":"2024-09-04T20:14:23.783507508Z","level":"info","msg":"finished node evaluation","controller_path":"/","controller_id":"","trace_id":"c1709a86c9849a4bc42d9586b31a673f","node_id":"labelstore","duration":9270}
{"ts":"2024-09-04T20:14:23.783551438Z","level":"info","msg":"finished node evaluation","controller_path":"/","controller_id":"","trace_id":"c1709a86c9849a4bc42d9586b31a673f","node_id":"tracing","duration":8990}
{"ts":"2024-09-04T20:14:23.783574438Z","level":"info","msg":"finished node evaluation","controller_path":"/","controller_id":"","trace_id":"c1709a86c9849a4bc42d9586b31a673f","node_id":"otel","duration":1810}
{"ts":"2024-09-04T20:14:23.78361252Z","level":"info","msg":"applying non-TLS config to HTTP server","service":"http"}
{"ts":"2024-09-04T20:14:23.78362498Z","level":"info","msg":"finished node evaluation","controller_path":"/","controller_id":"","trace_id":"c1709a86c9849a4bc42d9586b31a673f","node_id":"http","duration":32892}
{"ts":"2024-09-04T20:14:23.783952992Z","level":"info","msg":"finished node evaluation","controller_path":"/","controller_id":"","trace_id":"c1709a86c9849a4bc42d9586b31a673f","node_id":"cluster","duration":2580}
{"ts":"2024-09-04T20:14:23.784420467Z","level":"info","msg":"Using pod service account via in-cluster config","component_path":"/","component_id":"discovery.kubernetes.local_pods"}
{"ts":"2024-09-04T20:14:23.784886451Z","level":"info","msg":"finished node evaluation","controller_path":"/","controller_id":"","trace_id":"c1709a86c9849a4bc42d9586b31a673f","node_id":"discovery.kubernetes.local_pods","duration":877908}
{"ts":"2024-09-04T20:14:23.785026342Z","level":"info","msg":"finished node evaluation","controller_path":"/","controller_id":"","trace_id":"c1709a86c9849a4bc42d9586b31a673f","node_id":"discovery.process.all","duration":90800}
{"ts":"2024-09-04T20:14:23.785502667Z","level":"info","msg":"finished node evaluation","controller_path":"/","controller_id":"","trace_id":"c1709a86c9849a4bc42d9586b31a673f","node_id":"discovery.relabel.java_pods","duration":403204}
{"ts":"2024-09-04T20:14:23.786070292Z","level":"info","msg":"finished node evaluation","controller_path":"/","controller_id":"","trace_id":"c1709a86c9849a4bc42d9586b31a673f","node_id":"pyroscope.write.pyroscope_write","duration":518875}
{"ts":"2024-09-04T20:14:23.929998101Z","level":"info","msg":"finished node evaluation","controller_path":"/","controller_id":"","trace_id":"c1709a86c9849a4bc42d9586b31a673f","node_id":"pyroscope.java.java","duration":143866429}
{"ts":"2024-09-04T20:14:23.930232423Z","level":"info","msg":"finished node evaluation","controller_path":"/","controller_id":"","trace_id":"c1709a86c9849a4bc42d9586b31a673f","node_id":"remotecfg","duration":132201}
{"ts":"2024-09-04T20:14:23.930302133Z","level":"info","msg":"finished node evaluation","controller_path":"/","controller_id":"","trace_id":"c1709a86c9849a4bc42d9586b31a673f","node_id":"livedebugging","duration":25870}
{"ts":"2024-09-04T20:14:23.930370644Z","level":"info","msg":"finished node evaluation","controller_path":"/","controller_id":"","trace_id":"c1709a86c9849a4bc42d9586b31a673f","node_id":"ui","duration":14340}
{"ts":"2024-09-04T20:14:23.930394904Z","level":"info","msg":"finished complete graph evaluation","controller_path":"/","controller_id":"","trace_id":"c1709a86c9849a4bc42d9586b31a673f","duration":147237459}
{"ts":"2024-09-04T20:14:23.931033431Z","level":"info","msg":"scheduling loaded components and services"}
{"ts":"2024-09-04T20:14:23.931588126Z","level":"info","msg":"starting cluster node","service":"cluster","peers_count":0,"peers":"","advertise_addr":"127.0.0.1:12345"}
{"ts":"2024-09-04T20:14:23.932640515Z","level":"info","msg":"peers changed","service":"cluster","peers_count":1,"peers":"grafana-alloy-4pt7h"}
{"ts":"2024-09-04T20:14:23.933006819Z","level":"info","msg":"now listening for http traffic","service":"http","addr":"0.0.0.0:12345"}
{"ts":"2024-09-04T20:14:23.991554124Z","level":"error","msg":"failed to get process info","component_path":"/","component_id":"discovery.process.all","err":"readlink /proc/1/exe: permission denied","pid":1}
{"ts":"2024-09-04T20:14:23.991644175Z","level":"error","msg":"failed to get process info","component_path":"/","component_id":"discovery.process.all","err":"readlink /proc/2/exe: permission denied","pid":2}
{"ts":"2024-09-04T20:14:23.991677615Z","level":"error","msg":"failed to get process info","component_path":"/","component_id":"discovery.process.all","err":"readlink /proc/3/exe: permission denied","pid":3}
{"ts":"2024-09-04T20:14:23.991708775Z","level":"error","msg":"failed to get process info","component_path":"/","component_id":"discovery.process.all","err":"readlink /proc/4/exe: permission denied","pid":4}
{"ts":"2024-09-04T20:14:23.991749485Z","level":"error","msg":"failed to get process info","component_path":"/","component_id":"discovery.process.all","err":"readlink /proc/6/exe: permission denied","pid":6}
{"ts":"2024-09-04T20:14:23.991779286Z","level":"error","msg":"failed to get process info","component_path":"/","component_id":"discovery.process.all","err":"readlink /proc/8/exe: permission denied","pid":8}
{"ts":"2024-09-04T20:14:23.991807106Z","level":"error","msg":"failed to get process info","component_path":"/","component_id":"discovery.process.all","err":"readlink /proc/9/exe: permission denied","pid":9}
@caspar-ds caspar-ds added the bug Something isn't working label Sep 4, 2024
@caspar-ds
Copy link
Author

Adding the following allows the container to read what it needs:

alloy:
  # ...
  securityContext:
    runAsUser: 0
    runAsNonRoot: false
    capabilities:
      add:
        - all

Is it documented anywhere which capabilities are required for Alloy to function?

Thanks

@korniltsev
Copy link
Contributor

@caspar-ds
Copy link
Author

caspar-ds commented Sep 9, 2024

Yes, it is documented https://grafana.com/docs/alloy/latest/reference/components/pyroscope/pyroscope.java/#pyroscopejava

Hi @korniltsev! The only thing I can see in that documentation is a note about requiring root and running inside the host pid namespace, but that is not necessarily sufficient for things to work if linux capabilities are enabled (of which they will usually be in any well-configured production environment).

Took a little trial and error, but we found that the following was sufficient for our use case (using Grafana Alloy only to scrape data for Pyroscope):

alloy:
  # ...
  securityContext:
    runAsUser: 0
    runAsNonRoot: false
    capabilities:
      add:
        - PERFMON
        - SYS_PTRACE
        - SYS_RESOURCE
        - SYS_ADMIN

Hopefully this issue will help anyone else running into the same problem.

@caspar-ds caspar-ds changed the title Pyroscope Java profiling not working after following documentation Pyroscope Java profiling not working after following documentation (missing Linux capabilities) Sep 9, 2024
@korniltsev
Copy link
Contributor

korniltsev commented Sep 13, 2024

We usually run it as "privileged" root.
I agree we need to update docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working type/docs Docs Squad label across all Grafana Labs repos
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants