Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend] Pythonic tool parser #9859

Merged
merged 11 commits into from
Nov 14, 2024

Conversation

mdepinet
Copy link
Contributor

@mdepinet mdepinet commented Oct 30, 2024

This PR adds a tool parser for models that output tools formatted as Python function calls, such as Llama 3.2 and ToolACE-8B.

FIX #9991

Signed-off-by: Mike Depinet <[email protected]>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added the frontend label Oct 30, 2024
@DarkLight1337
Copy link
Member

cc @K-Mistele

@K-Mistele
Copy link
Contributor

cc @K-Mistele

thanks, taking a look!

@K-Mistele
Copy link
Contributor

Hi @mdepinet, the parser looks good to me! Just a couple notes:

  1. Can you include documentation about the newly supported model(s) in the docs/source/serving/openai_compatible_server.md? You will see examples for several other model families, so you will just need to adapt it for this parser and maybe indicate which models it is used for.
  2. Can you add configurations for the models that you intend for this parser to support to tests/tool_use/utils.py, instead of the custom tests that you have built out? I already built out a bunch of parameterized tests for tool-using LLMs and tool parsers, so all you need to do is add configs for the model(s) that this supports. This way, all tool parsers & tool calling models are checked against the same tests. You can find examples of this in the tests/tool_use/utils.py file itself, or you can see an example of it here in a PR for another model's tool parser.
    Once you've done that, you can run the tests locally using pytest tests/tool_use to make sure they pass, and you should be able to use the -k flag to select the parameterized versions for your model only if you desire.

@mergify mergify bot added the documentation Improvements or additions to documentation label Nov 1, 2024
@mdepinet mdepinet force-pushed the mike/pythonic-tool-calls branch from 0bcad39 to 683eb27 Compare November 1, 2024 23:07
@mdepinet
Copy link
Contributor Author

mdepinet commented Nov 1, 2024

Hi @mdepinet, the parser looks good to me! Just a couple notes:

  1. Can you include documentation about the newly supported model(s) in the docs/source/serving/openai_compatible_server.md? You will see examples for several other model families, so you will just need to adapt it for this parser and maybe indicate which models it is used for.

Sure, done.

  1. Can you add configurations for the models that you intend for this parser to support to tests/tool_use/utils.py, instead of the custom tests that you have built out? I already built out a bunch of parameterized tests for tool-using LLMs and tool parsers, so all you need to do is add configs for the model(s) that this supports. This way, all tool parsers & tool calling models are checked against the same tests. You can find examples of this in the tests/tool_use/utils.py file itself, or you can see an example of it here in a PR for another model's tool parser.
    Once you've done that, you can run the tests locally using pytest tests/tool_use to make sure they pass, and you should be able to use the -k flag to select the parameterized versions for your model only if you desire.

Thanks, I didn't see these. I'm happy to add a config so these tests exercise the new tool parser also, but I think it's worthwhile to keep the tests I added also because:

  1. The new tests are much faster to run, which is quite helpful when iterating on the implementation code. They're also easier to get running since they have fewer dependencies (no separate server).
  2. The new tests don't depend on any particular model or chat template. It makes sense to tie everything together when a strict JSON schema implies that tight coupling, but I think it's reasonable to define (and test) how the parser works in isolation for a parser that may have broader applicability.
  3. The new tests cover more cases than what would be reasonable in the existing tests.

Put differently, I view the existing tests more as integration tests and the tests I added more as unit tests. I think it'd be a mistake to entirely do away with either. Does that seem reasonable to you?

@mdepinet mdepinet force-pushed the mike/pythonic-tool-calls branch from c18bdf4 to b64ca59 Compare November 4, 2024 17:42
@K-Mistele
Copy link
Contributor

  • The new tests are much faster to run, which is quite helpful when iterating on the implementation code. They're also easier to get running since they have fewer dependencies (no separate server).

For sure, having extra ones is great! I just wanted to make sure that we didn't skip adding this into the existing integration tests for tool use. Checking it out now for testing :)

cc @maxdebayser it sounds like this might handle the llama 3.2 1B and 3B tool calling format that we were having issues with , using the llama3_json parser.

Possible closes #9991 ? will test.

@K-Mistele
Copy link
Contributor

Not an issue with this PR, but Team-ACE/ToolACE-8B requires a special chat template that the model's repo doesn't provide (or additional prompting techniques), in order for the model to properly use tools.

Couple options here:

  1. We can remove Team-ACE/ToolACE-8B from mentions of the pythonic tool parser until the chat template is added in examples (which i could create a separate PR for)
  2. We can add a chat template and indicate in the docs that people need to use it, akin to how we provided recommended chat templates for some other models.

I'll provide a tool call parser here in a bit if you're willing to add it to examples/tool_ace_chat_template.jinja and indicate that the model requires it to use tools properly, absent complicated prompting that must be done by the end-user.

docs/source/serving/openai_compatible_server.md Outdated Show resolved Hide resolved
docs/source/serving/openai_compatible_server.md Outdated Show resolved Hide resolved
tests/tool_use/utils.py Outdated Show resolved Hide resolved
@K-Mistele
Copy link
Contributor

using vllm serve meta-llama/Llama-3.2-3B-Instruct --enable-auto-tool-choice --tool-call-parser pythonic --chat-template examples/tool_chat_template_llama3.2_pythonic.jinja I had a really hard time getting a valid tool call out of the model even at a 0 temperature. Unclear if it's just because the model is really small or if it needs some better prompting on my end, but here are a few examples of what I got:

Screenshot 2024-11-06 at 11 52 27 PM
Screenshot 2024-11-06 at 11 51 55 PM

The first one looks like it could be a parser error, but I'm not sure.

@K-Mistele
Copy link
Contributor

Just tried running tests with pytest - everything looks good, the model just fails the tests with some frequency (pytest -v -s tests/tool_use -k pythonic will select only the tests for this tool parser).

@mdepinet can you see if it's possible to get them passing with some additional prompting? I think temperature=0 is default for these tests so that should help some.

Copy link

mergify bot commented Nov 7, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @mdepinet.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 7, 2024
@mergify mergify bot removed the needs-rebase label Nov 7, 2024
@mdepinet
Copy link
Contributor Author

mdepinet commented Nov 8, 2024

@K-Mistele I think this is ready for you now. The smaller Llama models aren't especially reliable. The tests are passing for me, but I'd be inclined to remove the Llama3.2 entry in favor of ToolACE if it's still flaky for you. (I'm actually most interested in fixie-ai/ultravox-v0_4-ToolACE-8B anyway.)

@K-Mistele
Copy link
Contributor

Thanks for the heads-up, I will give it another pass!

@K-Mistele
Copy link
Contributor

Yeah so overall, ToolACE implementation is definitely good to go. The llama 3.2 models I think need the <|python_tag|> issue I called out above handled, but I could be wrong. I see valid tool calls about half the time. The ones that aren't usually look like this:

Screenshot 2024-11-07 at 9 37 27 PM
(<|python_tag|> but no array wrapping the calls)

or like this:
Screenshot 2024-11-07 at 9 38 09 PM
(weirdly, the model is telling you what tools to call. I saw this with 3.1 some, which led to me creating my own chat template with a custom system prompt that fixes a lot of these issues; feel free to borrow/adapt the prompt if you want: https://gist.github.com/K-Mistele/820d142b4dab50bd8ef0c7bbcad4515c)

For what it's worth, some parsers like hermes to handle cases where the model generates text and a tool call; I think how you would handle this one for streaming is look for \n\n[ since that's what the model does whenever it wants to talk and call a tool - there's text, then a double newline and the array. Hermes's parser is a good example of streaming both text and tool deltas depending on what's being generated.

I think there are a couple ways to move forwards here:

  • move forwards with just ToolACE and the fixie-ai model; removing Llama 3.2 1B and 3B
  • include the llama 3.2 models in the PR, but note the low quality and omit them from CI since they are likely to fail
  • include the llama 3.2 models in the PR, but try to fix behavior with system prompts for when they're run in CI in tests/tool_use/utils.py by telling them to only generate tool calls and don't generate text if they're generating a tool call (the default system prompts & behavior from the llama 3.1 and 3.2 models is notoriously bad and this would probably go a long way to fixing them)

What do you think @mdepinet @DarkLight1337 @mgoin ?

@DarkLight1337
Copy link
Member

I think there are a couple ways to move forwards here:

  • move forwards with just ToolACE and the fixie-ai model; removing Llama 3.2 1B and 3B
  • include the llama 3.2 models in the PR, but note the low quality and omit them from CI since they are likely to fail
  • include the llama 3.2 models in the PR, but try to fix behavior with system prompts for when they're run in CI in tests/tool_use/utils.py by telling them to only generate tool calls and don't generate text if they're generating a tool call (the default system prompts & behavior from the llama 3.1 and 3.2 models is notoriously bad and this would probably go a long way to fixing them)

Unless the author of #9991 @SinanAkkoyun is ok with the first option, I'd go with the second option, and perhaps also log a warning in the code for Llama 3.2 models.

@SinanAkkoyun
Copy link

Hi!
@DarkLight1337 #9991 (comment) Ollama seems to handle llama3.2 function calling good

However, I don't really care if llama3.2 tool calling is supported here as long as I can verify that for example the hermes tool calling format works, but I can't even verify that sadly. Maybe I am missing configs?

@mdepinet mdepinet requested a review from K-Mistele November 8, 2024 17:33
@K-Mistele
Copy link
Contributor

Related #10164

Signed-off-by: Mike Depinet <[email protected]>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 14, 2024 02:59
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 14, 2024
@DarkLight1337 DarkLight1337 merged commit f67ce05 into vllm-project:main Nov 14, 2024
60 checks passed
omer-dayan pushed a commit to omer-dayan/vllm that referenced this pull request Nov 14, 2024
sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024
Signed-off-by: Mike Depinet <[email protected]>
Signed-off-by: Sumit Dubey <[email protected]>
@mdepinet mdepinet deleted the mike/pythonic-tool-calls branch November 14, 2024 17:35
KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024
mfournioux pushed a commit to mfournioux/vllm that referenced this pull request Nov 20, 2024
Signed-off-by: Mike Depinet <[email protected]>
Signed-off-by: Maxime Fournioux <[email protected]>
rickyyx pushed a commit to rickyyx/vllm that referenced this pull request Nov 20, 2024
tlrmchlsmth pushed a commit to neuralmagic/vllm that referenced this pull request Nov 23, 2024
Signed-off-by: Mike Depinet <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Llama3.2 tool calling OpenAI API not working
4 participants