Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add single node benchmarks for vLLM Rust engine. #253

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

piotrm-nvidia
Copy link
Contributor

No description provided.

@piotrm-nvidia
Copy link
Contributor Author

Build container with code.

Execute in container with 2 GPUs:

bash /workspace/examples/python_rs/llm/vllm/benchmark/bench_2GPUs_8B.sh


echo "Running benchmark..."

CONFIG_PREFIX="prefill_tp1dp1_generate_t1d1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we are actually using dp8? also typo t1d1

glos-nv
glos-nv previously approved these changes Feb 25, 2025
Copy link
Contributor

@glos-nv glos-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather push the work on benchmarks instead of dragging the review for days because of cosmetics, therefore I approve this merge requests. Consider all of my comments as suggestions, completely optional, but please give them a thought.


# Benchmark Scripts

This folder contains scripts and utilities for benchmarking disaggregated (and baseline) inference configurations with various GPU topologies and model sizes. The primary scripts are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Baseline" should be defined


echo "Activating Triton environment..."

source /opt/triton/venv/bin/activate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this line really necessary?
(applies to other files as well)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

virtualenv should be activated by default in the vllm container now, shouldn't need to manually activate


def get_label_from_name(name):
"""
Parses out a human-friendly label from the directory name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe clearer would be:
Drops "_tpXdpY" parts from the given string.

Parses out a human-friendly label from the directory name.
For example, 'purevllm_tp1dp1' -> 'purevllm'
'rustvllm_tp2dp4' -> 'rustvllm'
'context_tp2dp2' -> 'context' (you could replace 'context' with 'disagg' if desired)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this talk of possible replacement

LOGGER = logging.getLogger(__name__)


def parse_tp_dp(name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe count_gpus_from_tpdp_occurances

@@ -0,0 +1,375 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the code looks like something that we might want to use again in the future. Maybe we could extract it as a library? If so, we should think of a clean API and make it clearer which functions are "public", which "private" and make type annotations at least in the public part.


This folder contains scripts and utilities for benchmarking disaggregated (and baseline) inference configurations with various GPU topologies and model sizes. The primary scripts are:

1. **`bench_2GPUs_8B.sh`**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abbreviated name (bench instead of benchmark) is fine in a private context, but this is a script that we give to the users. Let's stay with a full name benchmark_2GPUs_8B.sh

@@ -0,0 +1,162 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large part of the scripts is repeated 3 times, but it's fine for now. I work on run_deployment.py script in integration tests and I hoped that it can be used here instead soon enough.

LOGGER = logging.getLogger(__name__)


def wait_for_server(url, model, timeout=300):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once integration tests are merged to Github a similar function could be imported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants