Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

自建Prometheus获取不到聚合指标 #7

Open
Quintonwong opened this issue Jun 26, 2022 · 9 comments
Open

自建Prometheus获取不到聚合指标 #7

Quintonwong opened this issue Jun 26, 2022 · 9 comments

Comments

@Quintonwong
Copy link

1、看crane-scheduler-controller日志发现聚合指标的监控项指标都获取不到
W0626 20:55:02.198329 1 node.go:61] failed to sync this node ["k8s-node4/mem_usage_avg_5m"]: can not annotate node[k8s-node4]: failed to get data mem_usage_avg_5m{k8s-node4=}:
2、
fe3d166c668c1cc8739fbaf5d2ce873

@autumn0207
Copy link
Collaborator

autumn0207 commented Jun 26, 2022

@Quintonwong

First, check if aggregated metrics data can be pulled inside the container:

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=cpu_usage_avg_5m'
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=mem_usage_avg_5m'

Then, check non-aggregated metrics data:

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=up'

If the non-aggregated metrics data is ok but non-aggregated metrics data cannot be pulled, it indicates that the prometheus rules does not take effect, please refer to https://prometheus.io/docs/prometheus/latest/configuration/configuration

@ArvinChen1991
Copy link

@Quintonwong

First, check if aggregated metrics data can be pulled inside the container:

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=cpu_usage_avg_5m'
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=mem_usage_avg_5m'

Then, check non-aggregated metrics data:

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query'

If the non-aggregated metrics data is ok but non-aggregated metrics data cannot be pulled, it indicates that the prometheus rules does not take effect, please refer to https://prometheus.io/docs/prometheus/latest/configuration/configuration

output error
curl -g 'http://x.x.x.x:9090/api/v1/query'
{"status":"error","errorType":"bad_data","error":"invalid parameter 'query': parse error at char 1: no expression found in input"}

@autumn0207
Copy link
Collaborator

curl -g 'http://x.x.x.x:9090/api/v1/query'

I made a mistake, the command should be

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=up'

@ArvinChen1991
Copy link

curl -g 'http://x.x.x.x:9090/api/v1/query'

I made a mistake, the command should be

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=up'

Return Success
image

@xieydd
Copy link
Member

xieydd commented Dec 9, 2022

I think you can increase second intervals of cpu_usage_active.

@sdnmw
Copy link

sdnmw commented Mar 19, 2023

I have same Problem。kubernetes version:1.23.10,crane version: v0.5.1,crane-scheduler-controller:v0.1.23.

I have checked the aggregated metrics data and non-aggregated metrics data, both can be obtained, and the modification interval of cpu_usage_active is 5s, but I still cannot obtain the data and annotate Node.

W0319 15:26:24.293385 1 node.go:61] failed to sync this node ["kse2/cpu_usage_avg_5m"]: can not annotate node[kse2]: failed to get data cpu_usage_avg_5m{kse2=}:
I0319 15:26:24.295764 1 node.go:75] Finished syncing node event "kse3/cpu_usage_avg_5m" (2.357063ms)
W0319 15:26:24.295781 1 node.go:61] failed to sync this node ["kse3/cpu_usage_avg_5m"]: can not annotate node[kse3]: failed to get data cpu_usage_avg_5m{kse3=}:
I0319 15:26:24.298258 1 node.go:75] Finished syncing node event "kse4/cpu_usage_avg_5m" (2.454873ms)
W0319 15:26:24.298279 1 node.go:61] failed to sync this node ["kse4/cpu_usage_avg_5m"]: can not annotate node[kse4]: failed to get data cpu_usage_avg_5m{kse4=}:

image

image

Could you help me @xieydd ,Thanks very much.

@nailianglu
Copy link

@Quintonwong

首先,检查是否可以将聚合的指标数据拉入容器:

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=cpu_usage_avg_5m'
curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=mem_usage_avg_5m'

然后,检查非聚合指标数据:

curl -g 'http://{REPLACE_ME_WITH_PROMETHEUS_ADDRESS}/api/v1/query?query=up'

如果非聚合指标数据正常,但无法拉取非聚合指标数据,则表明普罗米修斯规则没有生效,请参考https://普罗米修斯. io/docs/普罗米修斯/最新/配置/配置

你好,我也是遇到这个问题,进入到crane-scheduler-controller容器,可以获取到聚合数据,但是crane-scheduler-controller容器日志一直提示错误:I0330 13:18:01.658598 1 node.go:75] Finished syncing node event "cn-hangzhou.i-bp19r762s7xryoo6fjmx/mem_usage_avg_5m" (35.978µs)
W0330 13:18:01.658604 1 node.go:61] failed to sync this node ["cn-hangzhou.i-bp19r762s7xryoo6fjmx/mem_usage_avg_5m"]: can not annotate node[cn-hangzhou.i-bp19r762s7xryoo6fjmx]: failed to get data mem_usage_avg_5m{cn-hangzhou.i-bp19r762s7xryoo6fjmx=}: Post "10.7.1.60/api/v1/query": unsupported protocol scheme ""
Uploading 1680153559500.jpg…

@wyaopeng
Copy link

升级promeetheus和node-exporter至最新版本试下

@niyang110
Copy link

niyang110 commented Apr 25, 2024

@sdnmw 取不到值的原因是,crane会把nodename 转换为节点ip,用节点ip作为instance标签的值去Prometheus去查询的。
image
出现这种情况,应该是在K8S中部署的node_exporter,可以在Prometheus中抓取node-exporter加上标签的重置
- source_labels: [__meta_kubernetes_node_address_InternalIP]
target_label: instance
action: replace
- source_labels: [__meta_kubernetes_node_address_Hostname]
target_label: instance_name
action: replace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants