[Feature] metrics support #3534

CUHKSZzxy · 2025-05-09T13:07:32Z

Objective

Align with vLLM v1 metrics system and beyond. Here are several key alignments

Monotonic Timestamps:
-- Uses time.perf_counter() for interval calculations (avoids clock drift issues).
Metric Types:
-- Gauges: Active requests, cache usage, etc
-- Counters: Token totals, request success / failure counts, etc
-- Histograms: TTFT (Time-To-First-Token), TPOT (Inter-Token Latency), end-to-end latency, etc
Metrics Publishing:
-- CLI logging
-- Prometheus

We only record critical timestamps and events during the main loop and scheduling without further processing. Heavy-weight metrics calculations or metrics publishing are put inside separate coroutines to reduce the main engine loop overhead.

TODO

Use time.perf_counter()
Refactor to minimize the overhead of async engine generate() or engine _async_loop_main()
~~Expert information collections~~ (maybe deferred in another PR)
Grafana visualization (WIP, maybe deferred in another PR)

Usage

Start the server with --enable-metrics

lmdeploy serve api_server models--Qwen--Qwen2.5-7B-Instruct --enable-metrics

Metrics Publishing - Logging
With --enable-metrics, key metrics (e.g., running / waiting requests, cache usage, token throughput) are printed to the terminal every 5 seconds.
Metrics Publishing - Prometheus & Grafana (WIP)
Access metrics via http://localhost:23333/metrics/ , or curl the metrics endpoint as follows:

curl http:///localhost:23333/metrics/

Performance Impacts

Conclusion:

No obvious throughput degradation for Qwen-2.5-32B, minor degradation (1~2%) for Qwen-2.5-7B, and notable degradation (15% ~ 20%) for small models like Qwen-2.5-0.5B.

For details, you may check the following figures. Benchmark settings: 1000 prompts, input len 1000, output len 1000.

Qwen-2.5-7B (TP1), without the metrics.
Qwen-2.5-7B (TP1), with the metrics.
Qwen-2.5-0.5B (TP1), without the metrics.
Qwen-2.5-0.5B (TP1), with the metrics.

Related Issues & PR

Issue 2638, Issue 2673, PR1423

Conflicts: lmdeploy/messages.py lmdeploy/pytorch/engine/engine.py lmdeploy/pytorch/engine/engine_instance.py lmdeploy/pytorch/messages.py lmdeploy/pytorch/paging/scheduler.py

Conflicts: lmdeploy/serve/openai/api_server.py

CUHKSZzxy added 2 commits May 9, 2025 20:38

metrics support prototype

f8b4000

Merge branch 'main' into metrics-support

3e4fca9

Conflicts: lmdeploy/messages.py lmdeploy/pytorch/engine/engine.py lmdeploy/pytorch/engine/engine_instance.py lmdeploy/pytorch/messages.py lmdeploy/pytorch/paging/scheduler.py

CUHKSZzxy added the WIP label May 9, 2025

CUHKSZzxy added 22 commits May 12, 2025 18:01

Merge branch 'main' into metrics-support

02c46ec

Conflicts: lmdeploy/serve/openai/api_server.py

fix wrong conflict resolve

9ae6a1b

add GPU KV cache usage

7904d3a

independent logger for each DP

4a339c8

fix gpu cache usage

8c3ede1

Merge branch 'main' into metrics-support

ddeec2e

rename log stats

9229aa1

fix

862a708

update perf_counter and comments, some bug fix

74dc69a

Merge branch 'main' into metrics-support

19d81d4

overwrite with main branch

b87f099

Merge branch 'main' into metrics-support

d9f8e5a

refactor

0168eed

cleanup

d774cc3

fix

08200e1

add runtime cuda prometheus_client

a4d0ac9

fix

150d562

cleanup

1f80a8e

async log

aed3eea

fix gen throughput calculation

0931746

update max_model_len

57f3f91

Merge branch 'main' into metrics-support

4bdf89f

CUHKSZzxy removed the WIP label May 26, 2025

CUHKSZzxy added 2 commits May 26, 2025 20:29

fix running/waiting reqs calculations

83b7c60

Merge branch 'main' into metrics-support

67366b1

CUHKSZzxy marked this pull request as ready for review May 26, 2025 13:24

fix pr test

9729f0d

CUHKSZzxy added 4 commits May 27, 2025 11:20

fix

9c194ac

fix pr test

97ccdf3

update log level

72d4274

fix

382c500

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] metrics support #3534

[Feature] metrics support #3534

Uh oh!

CUHKSZzxy commented May 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

[Feature] metrics support #3534

Are you sure you want to change the base?

[Feature] metrics support #3534

Uh oh!

Conversation

CUHKSZzxy commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Objective

TODO

Usage

Performance Impacts

Related Issues & PR

Uh oh!

Uh oh!

CUHKSZzxy commented May 9, 2025 •

edited

Loading