Skip to content

Commit ad3844d

Browse files
authored
Add linux.dgx.b200 (#58)
* Test linux.dgx.b200 Signed-off-by: Huy Do <[email protected]> * Debug Signed-off-by: Huy Do <[email protected]> * Auth with AWS on B200 DGX runners Signed-off-by: Huy Do <[email protected]> * Add linux.dgx.b200.8 Signed-off-by: Huy Do <[email protected]> * Another tweak Signed-off-by: Huy Do <[email protected]> * [no ci] 2.7.1 Signed-off-by: Huy Do <[email protected]> * [no ci] Use cu128 Signed-off-by: Huy Do <[email protected]> * A small tweak * Keep the name unique Signed-off-by: Huy Do <[email protected]> * Sanitize the model name Signed-off-by: Huy Do <[email protected]> * Add sanitized device Signed-off-by: Huy Do <[email protected]> --------- Signed-off-by: Huy Do <[email protected]>
1 parent 120745b commit ad3844d

File tree

3 files changed

+20
-10
lines changed

3 files changed

+20
-10
lines changed

.github/scripts/generate_vllm_benchmark_matrix.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
"linux.aws.h100",
1919
"linux.rocm.gpu.gfx942.2", # No single ROCm GPU?
2020
"linux.24xl.spr-metal",
21+
"linux.dgx.b200",
2122
],
2223
# NB: There is no 2xH100 runner at the momement, so let's use the next one
2324
# in the list here which is 4xH100
@@ -34,6 +35,7 @@
3435
8: [
3536
"linux.aws.h100.8",
3637
"linux.rocm.gpu.gfx942.8",
38+
"linux.dgx.b200.8",
3739
],
3840
}
3941

@@ -43,6 +45,8 @@
4345
"linux.aws.h100": "cuda",
4446
"linux.aws.h100.4": "cuda",
4547
"linux.aws.h100.8": "cuda",
48+
"linux.dgx.b200": "cuda",
49+
"linux.dgx.b200.8": "cuda",
4650
"linux.rocm.gpu.gfx942.2": "rocm",
4751
"linux.rocm.gpu.gfx942.4": "rocm",
4852
"linux.rocm.gpu.gfx942.8": "rocm",

.github/scripts/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@ psutil==7.0.0
44
pynvml==12.0.0
55
boto3==1.36.21
66
awscli==1.37.21
7-
torch==2.7.0
7+
torch==2.7.1

.github/workflows/vllm-benchmark.yml

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,8 @@ jobs:
134134
pip install -r .github/scripts/requirements.txt \
135135
--extra-index-url https://download.pytorch.org/whl/rocm6.3
136136
else
137-
pip install -r .github/scripts/requirements.txt
137+
pip install -r .github/scripts/requirements.txt \
138+
--extra-index-url https://download.pytorch.org/whl/cu128
138139
fi
139140
140141
- name: Set Docker registry
@@ -277,15 +278,9 @@ jobs:
277278
)
278279
docker exec -t "${container_name}" bash -c "cd vllm-benchmarks/vllm && bash .buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh"
279280
280-
# Keep a copy of the benchmark results on GitHub for reference
281-
- uses: actions/upload-artifact@v4
282-
with:
283-
name: benchmark-results
284-
path: vllm-benchmarks/vllm/benchmarks/results
285-
286281
- name: Authenticate with AWS
287282
# AWS CUDA runners already have access to the bucket via its runner IAM role
288-
if: env.DEVICE_NAME != 'cuda'
283+
if: env.DEVICE_NAME == 'rocm' || contains(env.DEVICE_TYPE, 'B200')
289284
uses: aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722 # v4.1.0
290285
with:
291286
role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_upload-benchmark-results
@@ -304,10 +299,21 @@ jobs:
304299
ls -lah "${BENCHMARK_RESULTS}"
305300
306301
SANITIZED_DEVICE_TYPE=$(echo "${DEVICE_TYPE// /_}" | sed "s/[^[:alnum:].-]/_/g")
302+
SANITIZED_MODELS="${MODELS//\//_}"
303+
307304
python3 .github/scripts/upload_benchmark_results.py \
308305
--repo vllm-benchmarks/vllm \
309306
--benchmark-name "vLLM benchmark" \
310307
--benchmark-results "${BENCHMARK_RESULTS}" \
311308
--device-name "${DEVICE_NAME}" \
312309
--device-type "${SANITIZED_DEVICE_TYPE}" \
313-
--model "${MODELS//\//_}"
310+
--model "${SANITIZED_MODELS}"
311+
312+
echo "SANITIZED_DEVICE_TYPE=$SANITIZED_DEVICE_TYPE" >> $GITHUB_ENV
313+
echo "SANITIZED_MODELS=$SANITIZED_MODELS" >> $GITHUB_ENV
314+
315+
# Keep a copy of the benchmark results on GitHub for reference
316+
- uses: actions/upload-artifact@v4
317+
with:
318+
name: benchmark-results--${{ env.SANITIZED_DEVICE_TYPE }}-${{ env.SANITIZED_MODELS }}
319+
path: vllm-benchmarks/vllm/benchmarks/results

0 commit comments

Comments
 (0)