Add single node benchmarks for vLLM Rust engine. #253

piotrm-nvidia · 2025-02-24T15:22:25Z

No description provided.

piotrm-nvidia · 2025-02-24T15:24:05Z

Build container with code.

Execute in container with 2 GPUs:

bash /workspace/examples/python_rs/llm/vllm/benchmark/bench_2GPUs_8B.sh

ptarasiewiczNV · 2025-02-25T14:18:17Z

examples/python_rs/llm/vllm/benchmark/bench_2GPUs_1B.sh

+
+echo "Running benchmark..."
+
+CONFIG_PREFIX="prefill_tp1dp1_generate_t1d1"


looks like we are actually using dp8? also typo t1d1

glos-nv

I would rather push the work on benchmarks instead of dragging the review for days because of cosmetics, therefore I approve this merge requests. Consider all of my comments as suggestions, completely optional, but please give them a thought.

glos-nv · 2025-02-25T17:25:55Z

examples/python_rs/llm/vllm/benchmark/README.md

+
+# Benchmark Scripts
+
+This folder contains scripts and utilities for benchmarking disaggregated (and baseline) inference configurations with various GPU topologies and model sizes. The primary scripts are:


"Baseline" should be defined

glos-nv · 2025-02-25T18:26:05Z

examples/python_rs/llm/vllm/benchmark/bench_2GPUs_8B.sh

+
+echo "Activating Triton environment..."
+
+source /opt/triton/venv/bin/activate


is this line really necessary?
(applies to other files as well)

virtualenv should be activated by default in the vllm container now, shouldn't need to manually activate

glos-nv · 2025-02-25T19:05:17Z

examples/python_rs/llm/vllm/benchmark/process_gap_results.py

+
+def get_label_from_name(name):
+    """
+    Parses out a human-friendly label from the directory name.


Maybe clearer would be:
Drops "_tpXdpY" parts from the given string.

glos-nv · 2025-02-25T19:07:29Z

examples/python_rs/llm/vllm/benchmark/process_gap_results.py

+    Parses out a human-friendly label from the directory name.
+    For example, 'purevllm_tp1dp1' -> 'purevllm'
+    'rustvllm_tp2dp4' -> 'rustvllm'
+    'context_tp2dp2' -> 'context' (you could replace 'context' with 'disagg' if desired)


I don't understand this talk of possible replacement

glos-nv · 2025-02-25T19:17:17Z

examples/python_rs/llm/vllm/benchmark/process_gap_results.py

+LOGGER = logging.getLogger(__name__)
+
+
+def parse_tp_dp(name):


Maybe count_gpus_from_tpdp_occurances

glos-nv · 2025-02-25T19:22:51Z

examples/python_rs/llm/vllm/benchmark/process_gap_results.py

@@ -0,0 +1,375 @@
+#!/usr/bin/env python3


Some of the code looks like something that we might want to use again in the future. Maybe we could extract it as a library? If so, we should think of a clean API and make it clearer which functions are "public", which "private" and make type annotations at least in the public part.

glos-nv · 2025-02-25T19:26:32Z

examples/python_rs/llm/vllm/benchmark/README.md

+
+This folder contains scripts and utilities for benchmarking disaggregated (and baseline) inference configurations with various GPU topologies and model sizes. The primary scripts are:
+
+1. **`bench_2GPUs_8B.sh`**


Abbreviated name (bench instead of benchmark) is fine in a private context, but this is a script that we give to the users. Let's stay with a full name benchmark_2GPUs_8B.sh

glos-nv · 2025-02-25T19:30:21Z

examples/python_rs/llm/vllm/benchmark/bench_2GPUs_1B.sh

@@ -0,0 +1,162 @@
+#!/bin/bash


Large part of the scripts is repeated 3 times, but it's fine for now. I work on run_deployment.py script in integration tests and I hoped that it can be used here instead soon enough.

glos-nv · 2025-02-25T19:31:22Z

examples/python_rs/llm/vllm/benchmark/run_benchmark.py

+LOGGER = logging.getLogger(__name__)
+
+
+def wait_for_server(url, model, timeout=300):


Once integration tests are merged to Github a similar function could be imported.

piotrm-nvidia added 11 commits February 24, 2025 03:09

Add benchmark tools

6aac865

Add benchmarking scripts

7ce36cf

Add more echo commands

3f8c587

Add baseline to script

7a6c061

Adjust start commands

6fdb17d

Add and for endpoint

c74e81d

Add bg for workers

9a84786

Add logging

1ae6e33

Fix string typo

3ad8278

Fix logging config

c1ee845

Add model to alive request

a4f77f4

piotrm-nvidia temporarily deployed to GITLAB February 24, 2025 15:22 — with GitHub Actions Inactive

piotrm-nvidia temporarily deployed to GITLAB February 24, 2025 15:23 — with GitHub Actions Inactive

Remove not used file.

03355bf

piotrm-nvidia temporarily deployed to GITLAB February 24, 2025 15:30 — with GitHub Actions Inactive

piotrm-nvidia temporarily deployed to GITLAB February 24, 2025 15:31 — with GitHub Actions Inactive

Disable vllm logging and eager

d6ba221

piotrm-nvidia temporarily deployed to GITLAB February 24, 2025 15:33 — with GitHub Actions Inactive

piotrm-nvidia temporarily deployed to GITLAB February 24, 2025 15:36 — with GitHub Actions Inactive

Remove concurrency, fix matplotlib

25c9669

piotrm-nvidia temporarily deployed to GITLAB February 24, 2025 16:01 — with GitHub Actions Inactive

piotrm-nvidia temporarily deployed to GITLAB February 24, 2025 16:10 — with GitHub Actions Inactive

Add readme for benchmark folder

cd59562

piotrm-nvidia temporarily deployed to GITLAB February 24, 2025 16:32 — with GitHub Actions Inactive

piotrm-nvidia temporarily deployed to GITLAB February 24, 2025 16:33 — with GitHub Actions Inactive

piotrm-nvidia marked this pull request as ready for review February 24, 2025 16:33

piotrm-nvidia requested a review from a team as a code owner February 24, 2025 16:33

piotrm-nvidia requested review from glos-nv and ptarasiewiczNV February 24, 2025 16:34

piotrm-nvidia requested review from tanmayv25, ishandhanani, alec-flowers, nnshah1 and nv-anants as code owners February 25, 2025 13:10

piotrm-nvidia temporarily deployed to GITLAB February 25, 2025 13:10 — with GitHub Actions Inactive

piotrm-nvidia temporarily deployed to GITLAB February 25, 2025 13:11 — with GitHub Actions Inactive

ptarasiewiczNV reviewed Feb 25, 2025

View reviewed changes

glos-nv previously approved these changes Feb 25, 2025

View reviewed changes

Modiffy logging to pass mypy

3b1ed64

piotrm-nvidia dismissed glos-nv’s stale review via 3b1ed64 February 25, 2025 19:38

piotrm-nvidia temporarily deployed to GITLAB February 25, 2025 19:39 — with GitHub Actions Inactive

piotrm-nvidia temporarily deployed to GITLAB February 25, 2025 19:43 — with GitHub Actions Inactive

Merge branch 'main' into piotrm/benchmark

1dc325d

piotrm-nvidia temporarily deployed to GITLAB February 25, 2025 19:55 — with GitHub Actions Inactive

piotrm-nvidia temporarily deployed to GITLAB February 25, 2025 20:04 — with GitHub Actions Inactive

glos-nv self-requested a review February 26, 2025 10:52

glos-nv approved these changes Feb 26, 2025

View reviewed changes

Merge branch 'main' into piotrm/benchmark

1ceee10

piotrm-nvidia requested review from nv-blazejkubiak, ryanolson, grahamking, paulhendricks, biswapanda, tmonty12 and GuanLuo as code owners February 26, 2025 13:21

piotrm-nvidia temporarily deployed to GITLAB February 26, 2025 13:21 — with GitHub Actions Inactive

piotrm-nvidia temporarily deployed to GITLAB February 26, 2025 13:22 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add single node benchmarks for vLLM Rust engine. #253

Add single node benchmarks for vLLM Rust engine. #253

piotrm-nvidia commented Feb 24, 2025

piotrm-nvidia commented Feb 24, 2025

ptarasiewiczNV Feb 25, 2025

glos-nv left a comment

glos-nv Feb 25, 2025

glos-nv Feb 25, 2025

rmccorm4 Feb 25, 2025

glos-nv Feb 25, 2025

glos-nv Feb 25, 2025

glos-nv Feb 25, 2025

glos-nv Feb 25, 2025

glos-nv Feb 25, 2025

glos-nv Feb 25, 2025

glos-nv Feb 25, 2025


		echo "Running benchmark..."

		CONFIG_PREFIX="prefill_tp1dp1_generate_t1d1"


		# Benchmark Scripts

		This folder contains scripts and utilities for benchmarking disaggregated (and baseline) inference configurations with various GPU topologies and model sizes. The primary scripts are:


		echo "Activating Triton environment..."

		source /opt/triton/venv/bin/activate


		This folder contains scripts and utilities for benchmarking disaggregated (and baseline) inference configurations with various GPU topologies and model sizes. The primary scripts are:

		1. `bench_2GPUs_8B.sh`

		LOGGER = logging.getLogger(__name__)


		def wait_for_server(url, model, timeout=300):

Add single node benchmarks for vLLM Rust engine. #253

Are you sure you want to change the base?

Add single node benchmarks for vLLM Rust engine. #253

Conversation

piotrm-nvidia commented Feb 24, 2025

piotrm-nvidia commented Feb 24, 2025

Choose a reason for hiding this comment

glos-nv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment