-
Notifications
You must be signed in to change notification settings - Fork 689
Description
Describe the Bug
The benchmarks/llm/perf.sh script fails to create the directory structure expected by plot_pareto.py, preventing users from generating Pareto plots from benchmark results.
When running perf.sh with multiple concurrency levels (e.g., --concurrency 1,2,4,8), the script uses the same artifact_dir for all concurrency levels, causing AIPerf to overwrite results instead of creating separate subdirectories for each concurrency level.
Impact:
- Users cannot generate Pareto plots using
plot_pareto.py - The documented workflow in
benchmarks/llm/README.mddoesn't work end-to-end - Benchmark results for multiple concurrency levels are lost (only the last one remains)
Root Cause:
The perf.sh script (lines 212-243) uses the same artifact_dir variable for all concurrency levels in the loop, while plot_pareto.py (line 48) expects file paths containing -concurrency<number> to parse the concurrency level.
Steps to Reproduce
-
Start Dynamo services (frontend and worker):
python -m dynamo.frontend --http-port 8000 > frontend.log 2>&1 & CUDA_VISIBLE_DEVICES=0 python -m dynamo.vllm --model Qwen/Qwen3-0.6B > worker.log 2>&1 &
-
Wait for services to be ready (check logs or test endpoint):
curl http://localhost:8000/v1/models
-
Run benchmark with multiple concurrency levels:
bash benchmarks/llm/perf.sh \ --mode aggregated \ --deployment-kind dynamo_vllm \ --tensor-parallelism 1 \ --data-parallelism 1 \ --url http://localhost:8000 \ --model Qwen/Qwen3-0.6B \ --concurrency 1,2,4,8 \ --input-sequence-length 100 \ --output-sequence-length 50 \ --artifacts-root-dir test_benchmark
-
Verify the directory structure (should show only one file, not subdirectories):
ls -la test_benchmark/artifacts_0/ # Shows: profile_export_aiperf.json (single file, not in subdirectories) -
Attempt to generate Pareto plot:
python3 benchmarks/llm/plot_pareto.py --artifacts-root-dir test_benchmark
-
Observe error:
Exception: non-unique matches: []
Expected Behavior
The perf.sh script should create separate subdirectories for each concurrency level, resulting in the following directory structure:
artifacts_0/
├── deployment_config.json
├── -concurrency1/
│ └── profile_export_aiperf.json
├── -concurrency2/
│ └── profile_export_aiperf.json
├── -concurrency4/
│ └── profile_export_aiperf.json
└── -concurrency8/
└── profile_export_aiperf.json
The plot_pareto.py script should then be able to:
- Find all
profile_export_aiperf.jsonfiles in subdirectories matching the pattern-concurrency<number>/ - Parse the concurrency level from the directory name
- Generate a Pareto plot showing throughput vs latency trade-offs across different concurrency levels
This matches the documented workflow in benchmarks/llm/README.md (lines 370-391) and how other scripts in the codebase handle this (e.g., benchmarks/utils/aiperf.py line 113 creates c{c} subdirectories).
Actual Behavior
The script creates a single profile_export_aiperf.json file that gets overwritten for each concurrency level:
artifacts_0/
├── deployment_config.json
├── inputs.json
├── logs/
│ └── aiperf.log
├── profile_export.jsonl
├── profile_export_aiperf.csv
└── profile_export_aiperf.json (single file, contains only last concurrency level)
When running plot_pareto.py, it fails with:
Traceback (most recent call last):
File "/home/ubuntu/dynamo/benchmarks/llm/plot_pareto.py", line 270, in <module>
extracted_values = extract_val_and_concurrency(...)
File "/home/ubuntu/dynamo/benchmarks/llm/plot_pareto.py", line 100, in extract_val_and_concurrency
concurrency = parse_concurrency(aiperf_profile_export_json_path)
File "/home/ubuntu/dynamo/benchmarks/llm/plot_pareto.py", line 50, in parse_concurrency
raise Exception(f"non-unique matches: {matches}")
Exception: non-unique matches: []
The error occurs because plot_pareto.py expects file paths containing -concurrency<number> (e.g., artifacts_0/-concurrency1/profile_export_aiperf.json), but the file path is artifacts_0/profile_export_aiperf.json which doesn't match the pattern.
Environment
Operating System:
- Ubuntu 22.04.5 LTS (Jammy Jellyfish)
- Linux 5.15.0-126-generic x86_64
Hardware:
- GPU: NVIDIA L40S (driver 565.57.01, CUDA 12.7)
- CPU: 8 cores
- Memory: 144.4 GiB
Software Versions:
- Dynamo: 0.7.0 (built from source)
- Python: 3.12.12
- AIPerf: (installed via
pip install aiperf) - vLLM: 0.10.2
Test Environment:
- brev.dev cloud workspace
- Dynamo built from source (not from PyPI wheels)
- Model tested: Qwen/Qwen3-0.6B
Dynamo Configuration:
- Mode: Aggregated
- Tensor Parallelism: 1
- Data Parallelism: 1
- Frontend:
python -m dynamo.frontend --http-port 8000 - Worker:
python -m dynamo.vllm --model Qwen/Qwen3-0.6B
Additional Context
Related Issues
- Discovered while testing the restored benchmarking guide in issue [DOCS]: Bring back benchmarking guide #2031
- Affects the documented workflow in
benchmarks/llm/README.md(lines 370-391)
Code References
Problematic code in benchmarks/llm/perf.sh (lines 212-243):
for concurrency in "${concurrency_array[@]}"; do
echo "Run concurrency: $concurrency"
aiperf profile \
...
--artifact-dir ${artifact_dir} # Same directory for all concurrency levels
doneExpected pattern in benchmarks/llm/plot_pareto.py (line 48):
def parse_concurrency(name):
matches = re.findall(r"-concurrency(\d+)", name)
if len(matches) != 1:
raise Exception(f"non-unique matches: {matches}")Proposed Fix
The script should create a subdirectory for each concurrency level:
for concurrency in "${concurrency_array[@]}"; do
echo "Run concurrency: $concurrency"
# Create a subdirectory for this concurrency level
# The plot script expects subdirectories named -concurrency<number>
concurrency_dir="${artifact_dir}/-concurrency${concurrency}"
mkdir -p "${concurrency_dir}"
aiperf profile \
...
--artifact-dir ${concurrency_dir} # Use subdirectory instead
doneThis approach is consistent with:
benchmarks/utils/aiperf.pyline 113: createsc{c}subdirectoriesrecipes/qwen3-32b-fp8/trtllm/agg/perf.yamlline 54: createsconcurrency_${concurrency}directories
Testing the Fix
After applying the fix:
- Run benchmark:
bash benchmarks/llm/perf.sh --concurrency 1,2,4,8 ... - Verify structure:
find artifacts_root/artifacts_0 -type d -name "*-concurrency*" - Run plot script:
python3 benchmarks/llm/plot_pareto.py --artifacts-root-dir artifacts_root - Verify plot:
ls artifacts_root/pareto_plot.png
Screenshots
No response