-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: request reward model report 500 Internal Server Error #10444
Comments
Which version of vLLM are you using? Also please post the full stack trace. |
vllm==0.6.4.post1
|
Can you post the full output of |
Please also show the full command you used to launch the server. |
CUDA_VISIBLE_DEVICES="4,5,6,7" python -u -m vllm.entrypoints.openai.api_server --task embedding --host 0.0.0.0 --port 8099 --model xxx/Llama-3.1-Nemotron-70B-Reward-HF --served-model-name Llama-3.1-Nemotron-70B-Reward-HF --tensor-parallel-size 4 --disable-log-requests --enable-prefix-caching |
i do not test other model yet.
|
Can you try running other models? Unfortunately I can't directly debug this as I don't have enough GPU memory to load 70B models. |
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
following the instruction #8700 (comment) , the reward model https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Reward-HF is a LlamaForCausalLM model, so I serve it with vllm add parameter --task embedding.
when I send a request, it encounter en error:
INFO: "POST /v1/embeddings HTTP/1.1" 500 Internal Server Error
ERROR 11-19 17:55:16 engine.py:135] TypeError("object of type 'NoneType' has no len()")
and then the server terminated
the shell script:
or use python code same as https://github.com/vllm-project/vllm/blob/main/examples/openai_embedding_client.py
has the same error
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: