From 9c0b4e14fc799de7c2f5a0cdd846ad363a0239bf Mon Sep 17 00:00:00 2001 From: Gurpreet Singh Dhami <143527450+gurpreet-dhami@users.noreply.github.com> Date: Fri, 26 Jul 2024 16:04:39 -0400 Subject: [PATCH] Update README.md Update the arg names --- examples/fp8/quantizer/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/fp8/quantizer/README.md b/examples/fp8/quantizer/README.md index 8f89a74a6a367..0b6944f688b49 100644 --- a/examples/fp8/quantizer/README.md +++ b/examples/fp8/quantizer/README.md @@ -16,7 +16,7 @@ #### Run on H100 system for speed if FP8; number of GPUs depends on the model size #### Example: quantize Llama2-7b model from HF to FP8 with FP8 KV Cache: -`python quantize.py --model_dir ./ll2-7b --dtype float16 --qformat fp8 --kv_cache_dtype fp8 --output_dir ./ll2_7b_fp8 --calib_size 512 --tp_size 1` +`python quantize.py --model-dir ./ll2-7b --dtype float16 --qformat fp8 --kv-cache-dtype fp8 --output-dir ./ll2_7b_fp8 --calib-size 512 --tp-size 1` Outputs: model structure, quantized model & parameters (with scaling factors) are in JSON and Safetensors (npz is generated only for the reference) ```