Add Prometheus metric support #236

dyastremsky · 2025-01-04T01:35:59Z

Support a Prometheus metric endpoint in GenAI-Perf. This is a proof of concept. With this design, Prometheus metrics can be enabled via the CLI. If they are enabled, the metrics are exported at that endpoint. This would only be supported for the profile subcommand to begin with, though it may be useful to expand it to analyze. To avoid breaking anything, that is out of scope for this proof of concept.

When Prometheus metrics are enabled, the user must terminate the program when they want GenAI-Perf to exit. This design is necessary because otherwise, the metrics would be up for a very short amount of time, since the endpoint is closed when GenAI-Perf exits.

TODO: Only system metrics are displayed right now, so add the other metrics. Also, it may be good to omit request_goodput if goodput is not used.

Example command:
genai-perf profile -v -m gpt2 --service-kind openai --endpoint-type completions --num-requests 5 --num-prompts 5 --enable-prometheus --prometheus-port 8002

Example metrics:

nv-braf · 2025-01-07T01:55:10Z

genai-perf/genai_perf/main.py


 # Separate function that can raise exceptions used for testing
 # to assert correct errors and messages.
 def run():
    # TMA-1900: refactor CLI handler
-    logging.init_logging()


This should not have been deleted, it is needed for the logger to function properly

nv-braf · 2025-01-07T01:55:33Z

genai-perf/genai_perf/main.py

@@ -46,6 +50,31 @@ def run():
    else:  # profile
        args.func(args, extra_args)

+    if getattr(args, "enable_prometheus", False):
+        # Fix: This doesn't actually get logged.


You can fix this by adding main to logger.py

nv-braf · 2025-01-07T01:56:50Z

genai-perf/genai_perf/inputs/input_constants.py

@@ -70,6 +70,7 @@ def to_lowercase(self):
 DEFAULT_SYNTHETIC_FILENAME = "synthetic_data.json"
 DEFAULT_WARMUP_REQUEST_COUNT = 0
 DEFAULT_BACKEND = "tensorrtllm"
+DEFAULT_PROMETHEUS_PORT = 8002


This is the default port for the Tritonserver metrics. I would change this to be 9090 which is what prometheus uses in its getting started configuration file.

Draft Prometheus exporter

5ebf07f

dyastremsky self-assigned this Jan 4, 2025

dyastremsky temporarily deployed to GITLAB January 4, 2025 01:36 — with GitHub Actions Inactive

nv-braf reviewed Jan 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Prometheus metric support #236

Add Prometheus metric support #236

dyastremsky commented Jan 4, 2025 •

edited

Loading

nv-braf Jan 7, 2025

nv-braf Jan 7, 2025

nv-braf Jan 7, 2025

Add Prometheus metric support #236

Are you sure you want to change the base?

Add Prometheus metric support #236

Conversation

dyastremsky commented Jan 4, 2025 • edited Loading

nv-braf Jan 7, 2025

Choose a reason for hiding this comment

nv-braf Jan 7, 2025

Choose a reason for hiding this comment

nv-braf Jan 7, 2025

Choose a reason for hiding this comment

dyastremsky commented Jan 4, 2025 •

edited

Loading