-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics endpoint #1423
base: main
Are you sure you want to change the base?
Add metrics endpoint #1423
Conversation
感觉gpu和cpu是不是不用放里面,可以自己起一个nvidia的端口就能获得metrics,如 docker run -d --gpus all --rm -p 9400:9400 nvcr.io/nvidia/k8s/dcgm-exporter:3.1.7-3.1.4-ubuntu20.04 |
能不能添加tokens相关性能指标? |
Can we add first token time as well, so the difference between scheduling time and first token time can be used to estimate prefill time? |
Conflicts: lmdeploy/serve/async_engine.py
documentation='Number of total requests.', | ||
labelnames=labelnames) | ||
|
||
# latency metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latency类的指标可以考虑用Histogram/Summary类型的metric,方便计算分布以及不同时段的平均值,也可以简化计算逻辑
lmdeploy/serve/metrics.py
Outdated
|
||
# latency metrics | ||
self.gauge_duration_queue = Gauge( | ||
name='lmdeploy:duration_queue', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对于有单位的指标,可以在metrics_name中声明单位,如duration_queue_seconds
,写PromQL的时候会比较直观
documentation='CPU memory used bytes.', | ||
labelnames=labelnames) | ||
|
||
# requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request数量类指标建议使用Counter类型,单调递增
handle = pynvml.nvmlDeviceGetHandleByIndex(int(i)) | ||
mem_info = pynvml.nvmlDeviceGetMemoryInfo(handle) | ||
utilization = pynvml.nvmlDeviceGetUtilizationRates(handle) | ||
self.gpu_memory_used_bytes[str(i)] = str(mem_info.used) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpu的index可以用label来指定,用info的话指标变成str了,不好在PromQL中做计算
@uzuku |
Thanks for the great work. The metrics is import in production observation. Can we expect this feature to be merged in the next release? @AllentDan |
If our metrics can be compatible with vllm, it will greatly facilitate the comparison of deployment performance between lmdeploy and vllm. 如果我们的 metrics 可以和 vllm 的兼容,将为 lmdeploy 和 vllm 之间的部署性能比较,带来很大的方便。 |
好。我们尽量对齐 |
Open http://xxxx:23333/metrics/ to view the metrics.