[BUG]: perf.sh doesn't create per-concurrency subdirectories required by plot_pareto.py

### Describe the Bug

The `benchmarks/llm/perf.sh` script fails to create the directory structure expected by `plot_pareto.py`, preventing users from generating Pareto plots from benchmark results.

When running `perf.sh` with multiple concurrency levels (e.g., `--concurrency 1,2,4,8`), the script uses the same `artifact_dir` for all concurrency levels, causing AIPerf to overwrite results instead of creating separate subdirectories for each concurrency level.

**Impact:**
- Users cannot generate Pareto plots using `plot_pareto.py`
- The documented workflow in `benchmarks/llm/README.md` doesn't work end-to-end
- Benchmark results for multiple concurrency levels are lost (only the last one remains)

**Root Cause:**
The `perf.sh` script (lines 212-243) uses the same `artifact_dir` variable for all concurrency levels in the loop, while `plot_pareto.py` (line 48) expects file paths containing `-concurrency<number>` to parse the concurrency level.



### Steps to Reproduce

1. Start Dynamo services (frontend and worker):
   ```bash
   python -m dynamo.frontend --http-port 8000 > frontend.log 2>&1 &
   CUDA_VISIBLE_DEVICES=0 python -m dynamo.vllm --model Qwen/Qwen3-0.6B > worker.log 2>&1 &
   ```

2. Wait for services to be ready (check logs or test endpoint):
   ```bash
   curl http://localhost:8000/v1/models
   ```

3. Run benchmark with multiple concurrency levels:
   ```bash
   bash benchmarks/llm/perf.sh \
     --mode aggregated \
     --deployment-kind dynamo_vllm \
     --tensor-parallelism 1 \
     --data-parallelism 1 \
     --url http://localhost:8000 \
     --model Qwen/Qwen3-0.6B \
     --concurrency 1,2,4,8 \
     --input-sequence-length 100 \
     --output-sequence-length 50 \
     --artifacts-root-dir test_benchmark
   ```

4. Verify the directory structure (should show only one file, not subdirectories):
   ```bash
   ls -la test_benchmark/artifacts_0/
   # Shows: profile_export_aiperf.json (single file, not in subdirectories)
   ```

5. Attempt to generate Pareto plot:
   ```bash
   python3 benchmarks/llm/plot_pareto.py --artifacts-root-dir test_benchmark
   ```

6. **Observe error:**
   ```
   Exception: non-unique matches: []
   ```



### Expected Behavior

The `perf.sh` script should create separate subdirectories for each concurrency level, resulting in the following directory structure:

```
artifacts_0/
  ├── deployment_config.json
  ├── -concurrency1/
  │   └── profile_export_aiperf.json
  ├── -concurrency2/
  │   └── profile_export_aiperf.json
  ├── -concurrency4/
  │   └── profile_export_aiperf.json
  └── -concurrency8/
      └── profile_export_aiperf.json
```

The `plot_pareto.py` script should then be able to:
1. Find all `profile_export_aiperf.json` files in subdirectories matching the pattern `-concurrency<number>/`
2. Parse the concurrency level from the directory name
3. Generate a Pareto plot showing throughput vs latency trade-offs across different concurrency levels

This matches the documented workflow in `benchmarks/llm/README.md` (lines 370-391) and how other scripts in the codebase handle this (e.g., `benchmarks/utils/aiperf.py` line 113 creates `c{c}` subdirectories).



### Actual Behavior

The script creates a single `profile_export_aiperf.json` file that gets overwritten for each concurrency level:

```
artifacts_0/
  ├── deployment_config.json
  ├── inputs.json
  ├── logs/
  │   └── aiperf.log
  ├── profile_export.jsonl
  ├── profile_export_aiperf.csv
  └── profile_export_aiperf.json  (single file, contains only last concurrency level)
```

When running `plot_pareto.py`, it fails with:
```
Traceback (most recent call last):
  File "/home/ubuntu/dynamo/benchmarks/llm/plot_pareto.py", line 270, in <module>
    extracted_values = extract_val_and_concurrency(...)
  File "/home/ubuntu/dynamo/benchmarks/llm/plot_pareto.py", line 100, in extract_val_and_concurrency
    concurrency = parse_concurrency(aiperf_profile_export_json_path)
  File "/home/ubuntu/dynamo/benchmarks/llm/plot_pareto.py", line 50, in parse_concurrency
    raise Exception(f"non-unique matches: {matches}")
Exception: non-unique matches: []
```

The error occurs because `plot_pareto.py` expects file paths containing `-concurrency<number>` (e.g., `artifacts_0/-concurrency1/profile_export_aiperf.json`), but the file path is `artifacts_0/profile_export_aiperf.json` which doesn't match the pattern.

### Environment

**Operating System:**
- Ubuntu 22.04.5 LTS (Jammy Jellyfish)
- Linux 5.15.0-126-generic x86_64

**Hardware:**
- GPU: NVIDIA L40S (driver 565.57.01, CUDA 12.7)
- CPU: 8 cores
- Memory: 144.4 GiB

**Software Versions:**
- Dynamo: 0.7.0 (built from source)
- Python: 3.12.12
- AIPerf: (installed via `pip install aiperf`)
- vLLM: 0.10.2

**Test Environment:**
- brev.dev cloud workspace
- Dynamo built from source (not from PyPI wheels)
- Model tested: Qwen/Qwen3-0.6B

**Dynamo Configuration:**
- Mode: Aggregated
- Tensor Parallelism: 1
- Data Parallelism: 1
- Frontend: `python -m dynamo.frontend --http-port 8000`
- Worker: `python -m dynamo.vllm --model Qwen/Qwen3-0.6B`



### Additional Context

## Related Issues

- Discovered while testing the restored benchmarking guide in issue #2031
- Affects the documented workflow in `benchmarks/llm/README.md` (lines 370-391)

## Code References

**Problematic code in `benchmarks/llm/perf.sh` (lines 212-243):**
```bash
for concurrency in "${concurrency_array[@]}"; do
  echo "Run concurrency: $concurrency"
  
  aiperf profile \
    ...
    --artifact-dir ${artifact_dir}  # Same directory for all concurrency levels
done
```

**Expected pattern in `benchmarks/llm/plot_pareto.py` (line 48):**
```python
def parse_concurrency(name):
    matches = re.findall(r"-concurrency(\d+)", name)
    if len(matches) != 1:
        raise Exception(f"non-unique matches: {matches}")
```

## Proposed Fix

The script should create a subdirectory for each concurrency level:

```bash
for concurrency in "${concurrency_array[@]}"; do
  echo "Run concurrency: $concurrency"

  # Create a subdirectory for this concurrency level
  # The plot script expects subdirectories named -concurrency<number>
  concurrency_dir="${artifact_dir}/-concurrency${concurrency}"
  mkdir -p "${concurrency_dir}"

  aiperf profile \
    ...
    --artifact-dir ${concurrency_dir}  # Use subdirectory instead
done
```

This approach is consistent with:
- `benchmarks/utils/aiperf.py` line 113: creates `c{c}` subdirectories
- `recipes/qwen3-32b-fp8/trtllm/agg/perf.yaml` line 54: creates `concurrency_${concurrency}` directories

## Testing the Fix

After applying the fix:
1. Run benchmark: `bash benchmarks/llm/perf.sh --concurrency 1,2,4,8 ...`
2. Verify structure: `find artifacts_root/artifacts_0 -type d -name "*-concurrency*"`
3. Run plot script: `python3 benchmarks/llm/plot_pareto.py --artifacts-root-dir artifacts_root`
4. Verify plot: `ls artifacts_root/pareto_plot.png`



### Screenshots

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG]: perf.sh doesn't create per-concurrency subdirectories required by plot_pareto.py #4233

Describe the Bug

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Context

Related Issues

Code References

Proposed Fix

Testing the Fix

Screenshots

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: perf.sh doesn't create per-concurrency subdirectories required by plot_pareto.py #4233

Description

Describe the Bug

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Context

Related Issues

Code References

Proposed Fix

Testing the Fix

Screenshots

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions