diff --git a/benchmarks/inference-server/triton/README.md b/benchmarks/inference-server/triton/README.md index 814091dc8..81911e177 100644 --- a/benchmarks/inference-server/triton/README.md +++ b/benchmarks/inference-server/triton/README.md @@ -170,7 +170,7 @@ terraform apply | `model_id` | Model used for inference. | String | `"meta-llama/Llama-2-7b-chat-hf"` | No | | `gpu_count` | Parallelism based on number of gpus. | Number | `1` | No | | `ksa` | Kubernetes Service Account used for workload. | String | `"default"` | No | -| `huggingface-secret` | Name of the kubectl huggingface secret token | String | `"huggingface-secret"` | Yes | +| `huggingface_secret` | Name of the kubectl huggingface secret token | String | `"huggingface-secret"` | Yes | | `gcs_model_path` | Path where model engine in gcs will be read from. | String | null | Yes | | `server_launch_command_string` | Command to launc the Triton Inference Server | String | "pip install sentencepiece protobuf && huggingface-cli login --token $HUGGINGFACE_TOKEN && /opt/tritonserver/bin/tritonserver --model-repository=/all_models/inflight_batcher_llm --disable-auto-complete-config --backend-config=python,shm-region-prefix-name=prefix0_" | No |