Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus is missing container metrics from certain nodes #970

Open
noahpb opened this issue Oct 30, 2024 · 3 comments
Open

prometheus is missing container metrics from certain nodes #970

noahpb opened this issue Oct 30, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@noahpb
Copy link
Contributor

noahpb commented Oct 30, 2024

Environment

Device and OS: darwin arm64
App version: v0.29.1-unicorn
Kubernetes distro being used: k3d with two nodes

Steps to reproduce

  1. Create a k3d cluster with additional nodes
$ kubectl get node
NAME               STATUS   ROLES                  AGE   VERSION
k3d-agent1-0       Ready    <none>                 23m   v1.30.4+k3s1
k3d-uds-server-0   Ready    control-plane,master   25m   v1.30.4+k3s1
  1. Deploy uds-core with monitoring

Expected result

Container metrics such as CPU and Memory utilization should be queryable

Actual Result

Prometheus only returns metrics from pods that are scheduled on control plane nodes

Visual Proof (screenshots, videos, text, etc)

Metrics returned for container_cpu_usage_seconds
image

No metrics returned when filtering out control plane node:
image

Severity/Priority

Moderate

Additional Context

Removing all NetworkPolicies in the monitoring namespace allows Prometheus to pick up metrics from the missing nodes.

@noahpb noahpb added the possible-bug Something may not be working label Oct 30, 2024
@joelmccoy
Copy link
Contributor

@noahpb
Copy link
Contributor Author

noahpb commented Nov 1, 2024

Thanks to @rjferguson21's suggestion, we've been able to confirm that the allow-prometheus-stack-egress-metrics-scraping NetworkPolicy generated by the operator needs to be adjusted. The remoteNamespace: "" specification is not permissive enough to allow egress traffic to the prometheus-node-exporter daemonset pods. Manually adjusting the egress specification of the NetworkPolicy to the CIDR range of the nodes worked in my local testing.

@mjnagel
Copy link
Contributor

mjnagel commented Nov 5, 2024

Would suggest to resolve this we build an AllNodes generated target. We should be able to build that list of IPs using a watch on the nodes with Pepr, similar to our KubeAPI target. This would also be helpful for metrics-server which has an Anywhere rule with a todo comment to switch that to an all nodes target.

Code links for current kubeapi logic:

Once this is added as a generated target we can add it to Prometheus and make sure that the traffic works as expected.

@mjnagel mjnagel added bug Something isn't working and removed possible-bug Something may not be working labels Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants