Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update docker ENTRYPOINT to ensure proper argument handling (#962)
## Summary This PR updates the `ENTRYPOINT` instruction in the Dockerfile to ensure that additional arguments passed to the container via `docker run` are correctly appended to the entrypoint command. ### Before the change: Parameter `model` is not passed to the entrypoint command and the default model `facebook/opt-125m` is loaded instead. ```bash > sudo docker run --runtime=nvidia --gpus all -p 8000:8000 my-outlines-image --model="microsoft/phi-2" /usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( INFO 06-12 14:45:46 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='facebook/opt-125m', speculative_config=None, tokenizer='facebook/opt-125m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=facebook/opt-125m) ``` ### After the change: Parameter `model` is correctly passed to the entrypoint command ```bash > sudo docker run --runtime=nvidia --gpus all -p 8000:8000 my-outlines-image --model="microsoft/phi-2" /usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( INFO 06-12 14:59:17 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='microsoft/phi-2', speculative_config=None, tokenizer='microsoft/phi-2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=microsoft/phi-2) ```
- Loading branch information