Skip to content

Commit 1fbd151

Browse files
committed
[None][doc] update feature_combination_matrix of disaggregated and chunked prefill
Signed-off-by: leslie-fang25 <[email protected]>
1 parent e76e5c6 commit 1fbd151

File tree

3 files changed

+42
-4
lines changed

3 files changed

+42
-4
lines changed

docs/source/torch/features/feature_combination_matrix.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@
66
| CUDA Graph | Yes | --- | | | | | | | | | | | | |
77
| Attention Data Parallelism | Yes | Yes | --- | | | | | | | | | | | |
88
| Disaggregated Serving | Yes | Yes | Yes | --- | | | | | | | | | | |
9-
| Chunked Prefill | Yes | Yes | Yes | Untested | --- | | | | | | | | | |
9+
| Chunked Prefill | Yes | Yes | Yes | Yes | --- | | | | | | | | | |
1010
| MTP | Yes | Yes | Yes | Yes | Yes | --- | | | | | | | | |
1111
| EAGLE-3(One Model Engine) | Yes | Yes | Yes | Yes | Yes | No | --- | | | | | | | |
12-
| EAGLE-3(Two Model Engine) | NO | Yes | Yes | Yes | Yes | No | No | --- | | | | | | |
12+
| EAGLE-3(Two Model Engine) | No | Yes | Yes | Yes | Yes | No | No | --- | | | | | | |
1313
| Torch Sampler | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | --- | | | | | |
1414
| TLLM C++ Sampler | Yes | Yes | Yes | Yes | Yes | No | No | No | No | --- | | | | |
15-
| KV Cache Reuse | Yes | Yes | Yes | Untested | Yes | Untested | Yes | No | Yes | Yes | --- | | | |
16-
| Slide Window Attention | Yes | Yes | Yes | Untested | No | Untested | Untested | Untested | Yes | Yes | WIP | --- | | |
15+
| KV Cache Reuse | Yes | Yes | Yes | Yes | Yes | Untested | Yes | No | Yes | Yes | --- | | | |
16+
| Slide Window Attention | Yes | Yes | Yes | Yes | No | Untested | Untested | Untested | Yes | Yes | WIP | --- | | |
1717
| Logits Post Processor | No | Yes | Yes | No | Yes | No | No | No | Yes | Yes | Yes | Yes | --- | |
1818
| Guided Decoding | Yes | Yes | Yes | Yes | Yes | No | No | Yes | Yes | Yes | Yes | Yes | Yes | --- |

tests/integration/defs/accuracy/test_disaggregated_serving.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -776,3 +776,40 @@ def test_auto_dtype(self, overlap_scheduler):
776776
task.evaluate(llm)
777777
task = MMLU(self.MODEL_NAME)
778778
task.evaluate(llm)
779+
780+
def test_chunked_prefill(self):
781+
ctx_server_config = {
782+
"disable_overlap_scheduler": True,
783+
"cuda_graph_config": None,
784+
"cache_transceiver_config": {
785+
"backend": "DEFAULT"
786+
},
787+
"enable_chunked_prefill": True,
788+
"max_num_tokens": 256,
789+
}
790+
gen_server_config = {
791+
"cuda_graph_config": None,
792+
"cache_transceiver_config": {
793+
"backend": "DEFAULT"
794+
}
795+
}
796+
disaggregated_server_config = {
797+
"hostname": "localhost",
798+
"port": 8000,
799+
"backend": "pytorch",
800+
"context_servers": {
801+
"num_instances": 1,
802+
"urls": ["localhost:8001"]
803+
},
804+
"generation_servers": {
805+
"num_instances": 1,
806+
"urls": ["localhost:8002"]
807+
}
808+
}
809+
with launch_disaggregated_llm(disaggregated_server_config,
810+
ctx_server_config, gen_server_config,
811+
self.MODEL_PATH) as llm:
812+
task = GSM8K(self.MODEL_NAME)
813+
task.evaluate(llm)
814+
task = MMLU(self.MODEL_NAME)
815+
task.evaluate(llm)

tests/integration/test_lists/test-db/l0_dgx_h100.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ l0_dgx_h100:
4141
- accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_ngram
4242
- accuracy/test_disaggregated_serving.py::TestQwen3_8B::test_auto_dtype[False]
4343
- accuracy/test_disaggregated_serving.py::TestQwen3_8B::test_auto_dtype[True]
44+
- accuracy/test_disaggregated_serving.py::TestQwen3_8B::test_chunked_prefill
4445
- accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=False-overlap_scheduler=False]
4546
- accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3[eagle3_one_model=True-overlap_scheduler=True]
4647
- accuracy/test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_guided_decoding[xgrammar]

0 commit comments

Comments
 (0)