Skip to content

Commit

Permalink
fix readme
Browse files Browse the repository at this point in the history
  • Loading branch information
kaushikmitr committed Mar 8, 2024
1 parent af4bec4 commit 961b0c7
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion benchmarks/inference-server/triton/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ terraform apply
| `model_id` | Model used for inference. | String | `"meta-llama/Llama-2-7b-chat-hf"` | No |
| `gpu_count` | Parallelism based on number of gpus. | Number | `1` | No |
| `ksa` | Kubernetes Service Account used for workload. | String | `"default"` | No |
| `huggingface-secret` | Name of the kubectl huggingface secret token | String | `"huggingface-secret"` | Yes |
| `huggingface_secret` | Name of the kubectl huggingface secret token | String | `"huggingface-secret"` | Yes |
| `gcs_model_path` | Path where model engine in gcs will be read from. | String | null | Yes |
| `server_launch_command_string` | Command to launc the Triton Inference Server | String | "pip install sentencepiece protobuf && huggingface-cli login --token $HUGGINGFACE_TOKEN && /opt/tritonserver/bin/tritonserver --model-repository=/all_models/inflight_batcher_llm --disable-auto-complete-config --backend-config=python,shm-region-prefix-name=prefix0_" | No |

Expand Down

0 comments on commit 961b0c7

Please sign in to comment.