11version : 0.0.1
22l0_dgx_h100 :
3+ - condition :
4+ ranges :
5+ system_gpu_count :
6+ gte : 2
7+ lte : 2
8+ wildcards :
9+ gpu :
10+ - ' *h100*'
11+ linux_distribution_name : ubuntu*
12+ terms :
13+ stage : pre_merge
14+ backend : pytorch
15+ auto_trigger : others
16+ tests :
17+ - unittest/llmapi/test_llm_multi_gpu_pytorch.py -m "gpu2"
18+ - unittest/_torch/multi_gpu -m "not post_merge" TIMEOUT (90)
19+ - unittest/_torch/auto_deploy/unit/multigpu
20+ - unittest/_torch/modeling/test_modeling_pixtral.py::test_tensor_parallelism
21+ - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=False-overlap_scheduler=False]
22+ - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=True-overlap_scheduler=True]
23+ - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_guided_decoding[xgrammar]
24+ - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_guided_decoding_with_eagle3[xgrammar-eagle3_one_model=True]
25+ - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_guided_decoding_with_eagle3[xgrammar-eagle3_one_model=False]
26+ - accuracy/test_disaggregated_serving.py::TestQwen3_8B::test_auto_dtype[False]
27+ - accuracy/test_disaggregated_serving.py::TestQwen3_8B::test_auto_dtype[True]
28+ - accuracy/test_disaggregated_serving.py::TestQwen3_8B::test_chunked_prefill
29+ - accuracy/test_disaggregated_serving.py::TestQwen3_8B::test_nixl_backend
30+ - accuracy/test_disaggregated_serving.py::TestDeepSeekV3Lite::test_nixl_backend
31+ - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ngram
32+ - accuracy/test_disaggregated_serving.py::TestGemma3_1BInstruct::test_auto_dtype[False]
33+ - accuracy/test_disaggregated_serving.py::TestGemma3_1BInstruct::test_auto_dtype[True]
34+ - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[False]
35+ - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[True]
36+ # ------------- AutoDeploy tests ---------------
37+ - accuracy/test_llm_api_autodeploy.py::TestLlama3_1_8B::test_auto_dtype
338- condition :
439 ranges :
540 system_gpu_count :
@@ -15,9 +50,7 @@ l0_dgx_h100:
1550 auto_trigger : others
1651 tests :
1752 # ------------- PyTorch tests ---------------
18- - unittest/_torch/multi_gpu -m "not post_merge" TIMEOUT (90)
19- - unittest/_torch/auto_deploy/unit/multigpu
20- - unittest/llmapi/test_llm_multi_gpu_pytorch.py -m "gpu4 or gpu2"
53+ - unittest/llmapi/test_llm_multi_gpu_pytorch.py -m "gpu4"
2154 - accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_bfloat16_4gpus[tp4-attn_backend=TRTLLM-torch_compile=False]
2255 - accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_bfloat16_4gpus[tp2pp2-attn_backend=TRTLLM-torch_compile=False]
2356 - accuracy/test_llm_api_pytorch.py::TestLlama3_1_8BInstruct::test_bfloat16_4gpus[tp2pp2-attn_backend=TRTLLM-torch_compile=True]
@@ -35,19 +68,6 @@ l0_dgx_h100:
3568 - disaggregated/test_disaggregated.py::test_disaggregated_ctxpp2_gentp2[TinyLlama-1.1B-Chat-v1.0]
3669 - disaggregated/test_disaggregated.py::test_disaggregated_ctxpp4_gentp4[TinyLlama-1.1B-Chat-v1.0]
3770 - disaggregated/test_disaggregated.py::test_disaggregated_genbs1[TinyLlama-1.1B-Chat-v1.0]
38- - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[False]
39- - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_auto_dtype[True]
40- - accuracy/test_disaggregated_serving.py::TestGemma3_1BInstruct::test_auto_dtype[False]
41- - accuracy/test_disaggregated_serving.py::TestGemma3_1BInstruct::test_auto_dtype[True]
42- - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ngram
43- - accuracy/test_disaggregated_serving.py::TestQwen3_8B::test_auto_dtype[False]
44- - accuracy/test_disaggregated_serving.py::TestQwen3_8B::test_auto_dtype[True]
45- - accuracy/test_disaggregated_serving.py::TestQwen3_8B::test_chunked_prefill
46- - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=False-overlap_scheduler=False]
47- - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=True-overlap_scheduler=True]
48- - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_guided_decoding[xgrammar]
49- - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_guided_decoding_with_eagle3[xgrammar-eagle3_one_model=True]
50- - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_guided_decoding_with_eagle3[xgrammar-eagle3_one_model=False]
5171 - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_tp_pp_symmetric[GSM8K-tp1pp2]
5272 - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_tp_pp_symmetric[MMLU-tp1pp2]
5373 - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_tp_pp_symmetric[GSM8K-tp2pp1]
@@ -58,13 +78,8 @@ l0_dgx_h100:
5878 - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ctx_pp_gen_tp_asymmetric[MMLU-gen_tp=2-ctx_pp=2]
5979 - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_multi_instance[GSM8K]
6080 - accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_multi_instance[MMLU]
61- - accuracy/test_disaggregated_serving.py::TestQwen3_8B::test_nixl_backend
62- - accuracy/test_disaggregated_serving.py::TestDeepSeekV3Lite::test_nixl_backend
6381 - test_e2e.py::test_ptp_quickstart_advanced_bs1
6482 - test_e2e.py::test_ptp_quickstart_advanced_deepseek_v3_lite_4gpus_adp_balance[DeepSeek-V3-Lite-FP8-DeepSeek-V3-Lite/fp8]
65- - unittest/_torch/modeling/test_modeling_pixtral.py::test_tensor_parallelism
66- # ------------- AutoDeploy tests ---------------
67- - accuracy/test_llm_api_autodeploy.py::TestLlama3_1_8B::test_auto_dtype
6883- condition :
6984 ranges :
7085 system_gpu_count :
0 commit comments