Skip to content

[BUG]: perf.sh doesn't create per-concurrency subdirectories required by plot_pareto.py #4233

@AsadShahid04

Description

@AsadShahid04

Describe the Bug

The benchmarks/llm/perf.sh script fails to create the directory structure expected by plot_pareto.py, preventing users from generating Pareto plots from benchmark results.

When running perf.sh with multiple concurrency levels (e.g., --concurrency 1,2,4,8), the script uses the same artifact_dir for all concurrency levels, causing AIPerf to overwrite results instead of creating separate subdirectories for each concurrency level.

Impact:

  • Users cannot generate Pareto plots using plot_pareto.py
  • The documented workflow in benchmarks/llm/README.md doesn't work end-to-end
  • Benchmark results for multiple concurrency levels are lost (only the last one remains)

Root Cause:
The perf.sh script (lines 212-243) uses the same artifact_dir variable for all concurrency levels in the loop, while plot_pareto.py (line 48) expects file paths containing -concurrency<number> to parse the concurrency level.

Steps to Reproduce

  1. Start Dynamo services (frontend and worker):

    python -m dynamo.frontend --http-port 8000 > frontend.log 2>&1 &
    CUDA_VISIBLE_DEVICES=0 python -m dynamo.vllm --model Qwen/Qwen3-0.6B > worker.log 2>&1 &
  2. Wait for services to be ready (check logs or test endpoint):

    curl http://localhost:8000/v1/models
  3. Run benchmark with multiple concurrency levels:

    bash benchmarks/llm/perf.sh \
      --mode aggregated \
      --deployment-kind dynamo_vllm \
      --tensor-parallelism 1 \
      --data-parallelism 1 \
      --url http://localhost:8000 \
      --model Qwen/Qwen3-0.6B \
      --concurrency 1,2,4,8 \
      --input-sequence-length 100 \
      --output-sequence-length 50 \
      --artifacts-root-dir test_benchmark
  4. Verify the directory structure (should show only one file, not subdirectories):

    ls -la test_benchmark/artifacts_0/
    # Shows: profile_export_aiperf.json (single file, not in subdirectories)
  5. Attempt to generate Pareto plot:

    python3 benchmarks/llm/plot_pareto.py --artifacts-root-dir test_benchmark
  6. Observe error:

    Exception: non-unique matches: []
    

Expected Behavior

The perf.sh script should create separate subdirectories for each concurrency level, resulting in the following directory structure:

artifacts_0/
  ├── deployment_config.json
  ├── -concurrency1/
  │   └── profile_export_aiperf.json
  ├── -concurrency2/
  │   └── profile_export_aiperf.json
  ├── -concurrency4/
  │   └── profile_export_aiperf.json
  └── -concurrency8/
      └── profile_export_aiperf.json

The plot_pareto.py script should then be able to:

  1. Find all profile_export_aiperf.json files in subdirectories matching the pattern -concurrency<number>/
  2. Parse the concurrency level from the directory name
  3. Generate a Pareto plot showing throughput vs latency trade-offs across different concurrency levels

This matches the documented workflow in benchmarks/llm/README.md (lines 370-391) and how other scripts in the codebase handle this (e.g., benchmarks/utils/aiperf.py line 113 creates c{c} subdirectories).

Actual Behavior

The script creates a single profile_export_aiperf.json file that gets overwritten for each concurrency level:

artifacts_0/
  ├── deployment_config.json
  ├── inputs.json
  ├── logs/
  │   └── aiperf.log
  ├── profile_export.jsonl
  ├── profile_export_aiperf.csv
  └── profile_export_aiperf.json  (single file, contains only last concurrency level)

When running plot_pareto.py, it fails with:

Traceback (most recent call last):
  File "/home/ubuntu/dynamo/benchmarks/llm/plot_pareto.py", line 270, in <module>
    extracted_values = extract_val_and_concurrency(...)
  File "/home/ubuntu/dynamo/benchmarks/llm/plot_pareto.py", line 100, in extract_val_and_concurrency
    concurrency = parse_concurrency(aiperf_profile_export_json_path)
  File "/home/ubuntu/dynamo/benchmarks/llm/plot_pareto.py", line 50, in parse_concurrency
    raise Exception(f"non-unique matches: {matches}")
Exception: non-unique matches: []

The error occurs because plot_pareto.py expects file paths containing -concurrency<number> (e.g., artifacts_0/-concurrency1/profile_export_aiperf.json), but the file path is artifacts_0/profile_export_aiperf.json which doesn't match the pattern.

Environment

Operating System:

  • Ubuntu 22.04.5 LTS (Jammy Jellyfish)
  • Linux 5.15.0-126-generic x86_64

Hardware:

  • GPU: NVIDIA L40S (driver 565.57.01, CUDA 12.7)
  • CPU: 8 cores
  • Memory: 144.4 GiB

Software Versions:

  • Dynamo: 0.7.0 (built from source)
  • Python: 3.12.12
  • AIPerf: (installed via pip install aiperf)
  • vLLM: 0.10.2

Test Environment:

  • brev.dev cloud workspace
  • Dynamo built from source (not from PyPI wheels)
  • Model tested: Qwen/Qwen3-0.6B

Dynamo Configuration:

  • Mode: Aggregated
  • Tensor Parallelism: 1
  • Data Parallelism: 1
  • Frontend: python -m dynamo.frontend --http-port 8000
  • Worker: python -m dynamo.vllm --model Qwen/Qwen3-0.6B

Additional Context

Related Issues

Code References

Problematic code in benchmarks/llm/perf.sh (lines 212-243):

for concurrency in "${concurrency_array[@]}"; do
  echo "Run concurrency: $concurrency"
  
  aiperf profile \
    ...
    --artifact-dir ${artifact_dir}  # Same directory for all concurrency levels
done

Expected pattern in benchmarks/llm/plot_pareto.py (line 48):

def parse_concurrency(name):
    matches = re.findall(r"-concurrency(\d+)", name)
    if len(matches) != 1:
        raise Exception(f"non-unique matches: {matches}")

Proposed Fix

The script should create a subdirectory for each concurrency level:

for concurrency in "${concurrency_array[@]}"; do
  echo "Run concurrency: $concurrency"

  # Create a subdirectory for this concurrency level
  # The plot script expects subdirectories named -concurrency<number>
  concurrency_dir="${artifact_dir}/-concurrency${concurrency}"
  mkdir -p "${concurrency_dir}"

  aiperf profile \
    ...
    --artifact-dir ${concurrency_dir}  # Use subdirectory instead
done

This approach is consistent with:

  • benchmarks/utils/aiperf.py line 113: creates c{c} subdirectories
  • recipes/qwen3-32b-fp8/trtllm/agg/perf.yaml line 54: creates concurrency_${concurrency} directories

Testing the Fix

After applying the fix:

  1. Run benchmark: bash benchmarks/llm/perf.sh --concurrency 1,2,4,8 ...
  2. Verify structure: find artifacts_root/artifacts_0 -type d -name "*-concurrency*"
  3. Run plot script: python3 benchmarks/llm/plot_pareto.py --artifacts-root-dir artifacts_root
  4. Verify plot: ls artifacts_root/pareto_plot.png

Screenshots

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions