Skip to content

Commit

Permalink
removing --enable-chunked-prefill
Browse files Browse the repository at this point in the history
  • Loading branch information
arakowsk-amd authored Dec 17, 2024
1 parent 0881810 commit c75a4c4
Showing 1 changed file with 2 additions and 4 deletions.
6 changes: 2 additions & 4 deletions docs/dev-docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -261,8 +261,7 @@ Benchmark Meta-Llama-3.1-405B FP8 with input 128 tokens, output 128 tokens and t
--num-scheduler-steps 10 \
--tensor-parallel-size 8 \
--input-len 128 \
--output-len 128 \
--enable-chunked-prefill false
--output-len 128

If you want to run Meta-Llama-3.1-405B FP16, please run

Expand All @@ -278,8 +277,7 @@ If you want to run Meta-Llama-3.1-405B FP16, please run
--output-len 128 \
--swap-space 16 \
--max-model-len 8192 \
--max-num-batched-tokens 65536 \
--enable-chunked-prefill false
--max-num-batched-tokens 65536

For fp8 quantized Llama3.18B/70B models:

Expand Down

0 comments on commit c75a4c4

Please sign in to comment.