-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Llama3.2 tool calling OpenAI API not working #9991
Comments
ERROR 11-04 12:09:18 llama_tool_parser.py:116] Error in extracting tool call from response.
ERROR 11-04 12:09:18 llama_tool_parser.py:116] Traceback (most recent call last):
ERROR 11-04 12:09:18 llama_tool_parser.py:116] File "/home/ai/.mconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/tool_parsers/llama_tool_parser.py", lin
e 97, in extract_tool_calls
ERROR 11-04 12:09:18 llama_tool_parser.py:116] tool_calls: List[ToolCall] = [
ERROR 11-04 12:09:18 llama_tool_parser.py:116] File "/home/ai/.mconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/tool_parsers/llama_tool_parser.py", lin
e 101, in <listcomp>
ERROR 11-04 12:09:18 llama_tool_parser.py:116] name=raw_function_call["name"],
ERROR 11-04 12:09:18 llama_tool_parser.py:116] KeyError: 'name' |
Can you show your code? cc @K-Mistele |
I am writing a NodeJS TS client, but this is what the default OpenAI JS library gets from my end: tools.push({
type: 'function',
function: {
name: message.function.name,
parameters: adaptToFunctionParameters(message.function.parameters)
}
} as ChatCompletionTool)
// ...
const chatCompletion = await this.client.chat.completions.create({
messages,
tools: (tools && tools.length > 0) ? tools : undefined,
model: this.model,
}) which works with the official OpenAI API |
It looks like the Llama 3 tool parser applies to Llama 3.1, but not Llama 3.2. based on some of Meta's code and docs, they state this:
It seems like we may need a separate tool parser for 3.2 1B and 3B models? I was not aware that the format had been changed. we should probably clarify this in the docs as well, that the parser is for llama 3.1 and the larger 3.2 models, not llama 3.2 1B and 3B. cc @maxdebayser, happy to investigate & work with you on this if interested |
I'm definitely interested in looking into this. Maybe we need to add different tests then, because our existing ones are passing for 3.2 |
which 3.2 model are your tests running against? The docs I linked to make it seem like 1B and 3B have a different format compared to the larger 3.2 models |
#9859 has a parser for this |
Another question: When testing 3.1 8B, it ALWAYS wants to call a tool, even when I just tell it 'Hi', can someone reproduce that behaviour? To be specific: When I give it a tool to execute bash commands, it always responds with an What other model could I try for comparison? |
Not even mistral works for me: |
Yeah, this is a known issue with Llama 3.1 8B. Basically, meta's chat template's system prompt implicitly instructs the model to call a tool always. The model is designed for one-off tool calls where it receives a prompt, and generates a tool call. If you try passing a tool result back for it to interpret or use, it'll usually just try to call another tool. This isn't a tool parser or vLLM issue as much as a poorly designed chat template / system prompt. My recommendation every time that this comes up is to use a better chat template that alters their default system prompt. This is the chat template I always use for Llama 3.1 8B, and while it doesn't get perfect results (because the above behavior is how the model was trained as best as I can tell), it does improve behavior significantly at the cost of extra tokens. Compare meta's prompt & chat template with the one I use: |
I think I recall that a recent change with the mistral tokenizer may have affected mistral tool calling, but I can't for the life of me remember what the issue number is. Fwiw, this model ( Definitely should be fixed if there's a problem, but if you're just trying to test it out there are much better small tool calling models out there. I checked on the config that is being used for running vLLM with mistral for tools for testing in CI, and this is the config:
Can you try
It seems like the difference is that it's not using I will try to find the issue about tool calling and the mistral tokenizer, and we can either move this part of the conversation there or open a new issue |
Re: the llama 3.2 tool issue, as I mentioned, there is an open PR in #9859 that should resolve this, but there is also a test that uses llama 3.2 in CI that does pass, here is the config:
|
@K-Mistele Thank you so much for all the info and advice!! Could you please tell me the best working small tool calling models? |
@K-Mistele The llama config didn't work, it was able to chat and see the tools but not able to call any. Same thing with the mistral config strangely. |
The thing is that with ollama and llama3.2 the function calling at least works (despite it always calling functions, it is able to call multi-turn functions etc)
|
I'm not incredibly familiar with < 7B models for tool use since in my experience they still aren't really sufficiently reliable unless you're using guided generation |
I'm not familiar with ollama's implementation but it's possible they're using guided decoding or a grammar or something. |
You might check out xLAM though - https://huggingface.co/Salesforce/xLAM-1b-fc-r |
Thanks!! |
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
When trying to run Llama3.2 tool calling via
python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.2-1B-Instruct --enable-auto-tool-choice --tool-call-parser llama3_json
I do not get the OpenAI API function calling functionality but rather just get the tool call string:Using the official OpenAI API with 4o and Ollama works with my code
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: