-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add single node benchmarks for vLLM Rust engine. #253
base: main
Are you sure you want to change the base?
Conversation
Build container with code. Execute in container with 2 GPUs:
|
|
||
echo "Running benchmark..." | ||
|
||
CONFIG_PREFIX="prefill_tp1dp1_generate_t1d1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we are actually using dp8? also typo t1d1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather push the work on benchmarks instead of dragging the review for days because of cosmetics, therefore I approve this merge requests. Consider all of my comments as suggestions, completely optional, but please give them a thought.
|
||
# Benchmark Scripts | ||
|
||
This folder contains scripts and utilities for benchmarking disaggregated (and baseline) inference configurations with various GPU topologies and model sizes. The primary scripts are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Baseline" should be defined
|
||
echo "Activating Triton environment..." | ||
|
||
source /opt/triton/venv/bin/activate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this line really necessary?
(applies to other files as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
virtualenv should be activated by default in the vllm container now, shouldn't need to manually activate
|
||
def get_label_from_name(name): | ||
""" | ||
Parses out a human-friendly label from the directory name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe clearer would be:
Drops "_tpXdpY" parts from the given string.
Parses out a human-friendly label from the directory name. | ||
For example, 'purevllm_tp1dp1' -> 'purevllm' | ||
'rustvllm_tp2dp4' -> 'rustvllm' | ||
'context_tp2dp2' -> 'context' (you could replace 'context' with 'disagg' if desired) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this talk of possible replacement
LOGGER = logging.getLogger(__name__) | ||
|
||
|
||
def parse_tp_dp(name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe count_gpus_from_tpdp_occurances
@@ -0,0 +1,375 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the code looks like something that we might want to use again in the future. Maybe we could extract it as a library? If so, we should think of a clean API and make it clearer which functions are "public", which "private" and make type annotations at least in the public part.
|
||
This folder contains scripts and utilities for benchmarking disaggregated (and baseline) inference configurations with various GPU topologies and model sizes. The primary scripts are: | ||
|
||
1. **`bench_2GPUs_8B.sh`** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abbreviated name (bench instead of benchmark) is fine in a private context, but this is a script that we give to the users. Let's stay with a full name benchmark_2GPUs_8B.sh
@@ -0,0 +1,162 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large part of the scripts is repeated 3 times, but it's fine for now. I work on run_deployment.py
script in integration tests and I hoped that it can be used here instead soon enough.
LOGGER = logging.getLogger(__name__) | ||
|
||
|
||
def wait_for_server(url, model, timeout=300): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once integration tests are merged to Github a similar function could be imported.
No description provided.