Skip to content

Commit 8ebfa6e

Browse files
committed
Enable accuracy test for eagle3 and chunked prefill
Signed-off-by: leslie-fang25 <[email protected]>
1 parent 3f7abf8 commit 8ebfa6e

File tree

3 files changed

+12
-5
lines changed

3 files changed

+12
-5
lines changed

docs/source/torch/features/feature_combination_matrix.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@
88
| Disaggregated Serving | Yes | Yes | Yes | --- | | | | | | | | | | |
99
| Chunked Prefill | Yes | Yes | Yes | Untested | --- | | | | | | | | | |
1010
| MTP | Yes | Yes | Yes | Yes | Untested | --- | | | | | | | | |
11-
| EAGLE-3(One Model Engine) | Yes | Yes | Yes | No | Untested | No | --- | | | | | | | |
12-
| EAGLE-3(Two Model Engine) | NO | Yes | Yes | No | Untested | No | No | --- | | | | | | |
11+
| EAGLE-3(One Model Engine) | Yes | Yes | Yes | No | Yes | No | --- | | | | | | | |
12+
| EAGLE-3(Two Model Engine) | NO | Yes | Yes | No | Yes | No | No | --- | | | | | | |
1313
| Torch Sampler | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | --- | | | | | |
1414
| TLLM C++ Sampler | Yes | Yes | Yes | Yes | Yes | No | No | No | No | --- | | | | |
1515
| KV Cache Reuse | Yes | Yes | Yes | Untested | Yes | Untested | Yes | No | Yes | Yes | --- | | | |

tests/integration/defs/accuracy/test_llm_api_pytorch.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1817,7 +1817,9 @@ def test_bf16(self, tp_size, pp_size, ep_size, attention_dp, cuda_graph,
18171817
task = MMLU(self.MODEL_NAME)
18181818
task.evaluate(llm)
18191819

1820-
def test_eagle3(self):
1820+
@parametrize_with_ids("eagle3_one_model", [True, False])
1821+
@parametrize_with_ids("enable_chunked_prefill", [False, True])
1822+
def test_eagle3(self, enable_chunked_prefill, eagle3_one_model):
18211823
pytorch_config = dict(
18221824
disable_overlap_scheduler=True,
18231825
cuda_graph_config=CudaGraphConfig(batch_sizes=[1]),
@@ -1829,11 +1831,13 @@ def test_eagle3(self):
18291831

18301832
draft_len = 4
18311833
spec_config = EagleDecodingConfig(max_draft_len=draft_len,
1832-
speculative_model_dir=eagle_model_dir)
1834+
speculative_model_dir=eagle_model_dir,
1835+
eagle3_one_model=eagle3_one_model)
18331836

18341837
llm = LLM(model=target_model_dir,
18351838
**pytorch_config,
18361839
kv_cache_config=kv_cache_config,
1840+
enable_chunked_prefill=enable_chunked_prefill,
18371841
speculative_config=spec_config,
18381842
build_config=None)
18391843

tests/integration/test_lists/test-db/l0_h100.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,10 @@ l0_h100:
4343
- accuracy/test_llm_api_pytorch.py::TestQwen3_8B::test_fp8_block_scales[latency]
4444
- accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B::test_fp8[latency-torch_compile=False]
4545
- accuracy/test_llm_api_pytorch.py::TestQwen3_30B_A3B::test_fp8[latency-torch_compile=True]
46-
- accuracy/test_llm_api_pytorch.py::TestQwen3_8B::test_eagle3
46+
- accuracy/test_llm_api_pytorch.py::TestQwen3_8B::test_eagle3[enable_chunked_prefill=False-eagle3_one_model=False]
47+
- accuracy/test_llm_api_pytorch.py::TestQwen3_8B::test_eagle3[enable_chunked_prefill=True-eagle3_one_model=True]
48+
- accuracy/test_llm_api_pytorch.py::TestQwen3_8B::test_eagle3[enable_chunked_prefill=False-eagle3_one_model=True]
49+
- accuracy/test_llm_api_pytorch.py::TestQwen3_8B::test_eagle3[enable_chunked_prefill=True-eagle3_one_model=False]
4750
- accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_cuda_graph_padding[mtp_nextn=0]
4851
- accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_cuda_graph_padding[mtp_nextn=2]
4952
- test_e2e.py::test_trtllm_bench_pytorch_backend_sanity[meta-llama/Llama-3.1-8B-llama-3.1-8b-False-False]

0 commit comments

Comments
 (0)