[Frontend] Pythonic tool parser #9859

mdepinet · 2024-10-30T23:19:12Z

This PR adds a tool parser for models that output tools formatted as Python function calls, such as Llama 3.2 and ToolACE-8B.

FIX #9991

Signed-off-by: Mike Depinet <[email protected]>

github-actions · 2024-10-30T23:19:23Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

DarkLight1337 · 2024-10-31T01:23:21Z

cc @K-Mistele

K-Mistele · 2024-10-31T16:49:34Z

cc @K-Mistele

thanks, taking a look!

K-Mistele · 2024-11-01T07:07:36Z

Hi @mdepinet, the parser looks good to me! Just a couple notes:

Can you include documentation about the newly supported model(s) in the docs/source/serving/openai_compatible_server.md? You will see examples for several other model families, so you will just need to adapt it for this parser and maybe indicate which models it is used for.
Can you add configurations for the models that you intend for this parser to support to tests/tool_use/utils.py, instead of the custom tests that you have built out? I already built out a bunch of parameterized tests for tool-using LLMs and tool parsers, so all you need to do is add configs for the model(s) that this supports. This way, all tool parsers & tool calling models are checked against the same tests. You can find examples of this in the tests/tool_use/utils.py file itself, or you can see an example of it here in a PR for another model's tool parser.
Once you've done that, you can run the tests locally using pytest tests/tool_use to make sure they pass, and you should be able to use the -k flag to select the parameterized versions for your model only if you desire.

Signed-off-by: Mike Depinet <[email protected]>

mdepinet · 2024-11-01T23:26:12Z

Hi @mdepinet, the parser looks good to me! Just a couple notes:

Can you include documentation about the newly supported model(s) in the docs/source/serving/openai_compatible_server.md? You will see examples for several other model families, so you will just need to adapt it for this parser and maybe indicate which models it is used for.

Sure, done.

Can you add configurations for the models that you intend for this parser to support to tests/tool_use/utils.py, instead of the custom tests that you have built out? I already built out a bunch of parameterized tests for tool-using LLMs and tool parsers, so all you need to do is add configs for the model(s) that this supports. This way, all tool parsers & tool calling models are checked against the same tests. You can find examples of this in the tests/tool_use/utils.py file itself, or you can see an example of it here in a PR for another model's tool parser.
Once you've done that, you can run the tests locally using pytest tests/tool_use to make sure they pass, and you should be able to use the -k flag to select the parameterized versions for your model only if you desire.

Thanks, I didn't see these. I'm happy to add a config so these tests exercise the new tool parser also, but I think it's worthwhile to keep the tests I added also because:

The new tests are much faster to run, which is quite helpful when iterating on the implementation code. They're also easier to get running since they have fewer dependencies (no separate server).
The new tests don't depend on any particular model or chat template. It makes sense to tie everything together when a strict JSON schema implies that tight coupling, but I think it's reasonable to define (and test) how the parser works in isolation for a parser that may have broader applicability.
The new tests cover more cases than what would be reasonable in the existing tests.

Put differently, I view the existing tests more as integration tests and the tests I added more as unit tests. I think it'd be a mistake to entirely do away with either. Does that seem reasonable to you?

Signed-off-by: Mike Depinet <[email protected]>

K-Mistele · 2024-11-07T05:17:03Z

The new tests are much faster to run, which is quite helpful when iterating on the implementation code. They're also easier to get running since they have fewer dependencies (no separate server).

For sure, having extra ones is great! I just wanted to make sure that we didn't skip adding this into the existing integration tests for tool use. Checking it out now for testing :)

cc @maxdebayser it sounds like this might handle the llama 3.2 1B and 3B tool calling format that we were having issues with , using the llama3_json parser.

Possible closes #9991 ? will test.

K-Mistele · 2024-11-07T05:32:30Z

Not an issue with this PR, but Team-ACE/ToolACE-8B requires a special chat template that the model's repo doesn't provide (or additional prompting techniques), in order for the model to properly use tools.

Couple options here:

We can remove Team-ACE/ToolACE-8B from mentions of the pythonic tool parser until the chat template is added in examples (which i could create a separate PR for)
We can add a chat template and indicate in the docs that people need to use it, akin to how we provided recommended chat templates for some other models.

I'll provide a tool call parser here in a bit if you're willing to add it to examples/tool_ace_chat_template.jinja and indicate that the model requires it to use tools properly, absent complicated prompting that must be done by the end-user.

docs/source/serving/openai_compatible_server.md

tests/tool_use/utils.py

K-Mistele · 2024-11-07T05:55:06Z

using vllm serve meta-llama/Llama-3.2-3B-Instruct --enable-auto-tool-choice --tool-call-parser pythonic --chat-template examples/tool_chat_template_llama3.2_pythonic.jinja I had a really hard time getting a valid tool call out of the model even at a 0 temperature. Unclear if it's just because the model is really small or if it needs some better prompting on my end, but here are a few examples of what I got:

The first one looks like it could be a parser error, but I'm not sure.

K-Mistele · 2024-11-07T05:58:39Z

Just tried running tests with pytest - everything looks good, the model just fails the tests with some frequency (pytest -v -s tests/tool_use -k pythonic will select only the tests for this tool parser).

@mdepinet can you see if it's possible to get them passing with some additional prompting? I think temperature=0 is default for these tests so that should help some.

mergify · 2024-11-07T15:10:59Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @mdepinet.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Mike Depinet <[email protected]>

mdepinet · 2024-11-08T00:29:47Z

@K-Mistele I think this is ready for you now. The smaller Llama models aren't especially reliable. The tests are passing for me, but I'd be inclined to remove the Llama3.2 entry in favor of ToolACE if it's still flaky for you. (I'm actually most interested in fixie-ai/ultravox-v0_4-ToolACE-8B anyway.)

K-Mistele · 2024-11-08T03:04:26Z

Thanks for the heads-up, I will give it another pass!

docs/source/serving/openai_compatible_server.md

K-Mistele · 2024-11-08T03:46:39Z

Yeah so overall, ToolACE implementation is definitely good to go. The llama 3.2 models I think need the <|python_tag|> issue I called out above handled, but I could be wrong. I see valid tool calls about half the time. The ones that aren't usually look like this:

(<|python_tag|> but no array wrapping the calls)

or like this:

(weirdly, the model is telling you what tools to call. I saw this with 3.1 some, which led to me creating my own chat template with a custom system prompt that fixes a lot of these issues; feel free to borrow/adapt the prompt if you want: https://gist.github.com/K-Mistele/820d142b4dab50bd8ef0c7bbcad4515c)

For what it's worth, some parsers like hermes to handle cases where the model generates text and a tool call; I think how you would handle this one for streaming is look for \n\n[ since that's what the model does whenever it wants to talk and call a tool - there's text, then a double newline and the array. Hermes's parser is a good example of streaming both text and tool deltas depending on what's being generated.

I think there are a couple ways to move forwards here:

move forwards with just ToolACE and the fixie-ai model; removing Llama 3.2 1B and 3B
include the llama 3.2 models in the PR, but note the low quality and omit them from CI since they are likely to fail
include the llama 3.2 models in the PR, but try to fix behavior with system prompts for when they're run in CI in tests/tool_use/utils.py by telling them to only generate tool calls and don't generate text if they're generating a tool call (the default system prompts & behavior from the llama 3.1 and 3.2 models is notoriously bad and this would probably go a long way to fixing them)

What do you think @mdepinet @DarkLight1337 @mgoin ?

DarkLight1337 · 2024-11-08T04:02:45Z

I think there are a couple ways to move forwards here:

move forwards with just ToolACE and the fixie-ai model; removing Llama 3.2 1B and 3B

include the llama 3.2 models in the PR, but note the low quality and omit them from CI since they are likely to fail

include the llama 3.2 models in the PR, but try to fix behavior with system prompts for when they're run in CI in tests/tool_use/utils.py by telling them to only generate tool calls and don't generate text if they're generating a tool call (the default system prompts & behavior from the llama 3.1 and 3.2 models is notoriously bad and this would probably go a long way to fixing them)

Unless the author of #9991 @SinanAkkoyun is ok with the first option, I'd go with the second option, and perhaps also log a warning in the code for Llama 3.2 models.

SinanAkkoyun · 2024-11-08T09:29:36Z

Hi!
@DarkLight1337 #9991 (comment) Ollama seems to handle llama3.2 function calling good

However, I don't really care if llama3.2 tool calling is supported here as long as I can verify that for example the hermes tool calling format works, but I can't even verify that sadly. Maybe I am missing configs?

Signed-off-by: Mike Depinet <[email protected]>

docs/source/serving/openai_compatible_server.md

vllm/entrypoints/openai/serving_chat.py

K-Mistele · 2024-11-11T04:48:54Z

Related #10164

Signed-off-by: Mike Depinet <[email protected]>

DarkLight1337

Thanks for adding this!

Signed-off-by: Mike Depinet <[email protected]> Signed-off-by: OmerD <[email protected]>

Signed-off-by: Mike Depinet <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>

Signed-off-by: Mike Depinet <[email protected]>

Signed-off-by: Mike Depinet <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>

Signed-off-by: Mike Depinet <[email protected]> Signed-off-by: rickyx <[email protected]>

Signed-off-by: Mike Depinet <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>

Signed-off-by: Mike Depinet <[email protected]>

pythonic tool parser

1aac0b8

Signed-off-by: Mike Depinet <[email protected]>

mdepinet requested review from DarkLight1337, robertgshaw2-neuralmagic and simon-mo as code owners October 30, 2024 23:19

mergify bot added the frontend label Oct 30, 2024

mergify bot added the documentation Improvements or additions to documentation label Nov 1, 2024

Add an entry to openai_compatible_server.md

683eb27

Signed-off-by: Mike Depinet <[email protected]>

mdepinet force-pushed the mike/pythonic-tool-calls branch from 0bcad39 to 683eb27 Compare November 1, 2024 23:07

Integration test attempt (can't run on my poor 4070)

b64ca59

Signed-off-by: Mike Depinet <[email protected]>

mdepinet force-pushed the mike/pythonic-tool-calls branch from c18bdf4 to b64ca59 Compare November 4, 2024 17:42

checkpoint: fix most tool_use tests

9553df6

Signed-off-by: Mike Depinet <[email protected]>

K-Mistele reviewed Nov 7, 2024

View reviewed changes

docs/source/serving/openai_compatible_server.md Outdated Show resolved Hide resolved

docs/source/serving/openai_compatible_server.md Outdated Show resolved Hide resolved

tests/tool_use/utils.py Outdated Show resolved Hide resolved

K-Mistele mentioned this pull request Nov 7, 2024

[Bug]: Llama3.2 tool calling OpenAI API not working #9991

Closed

1 task

mergify bot added the needs-rebase label Nov 7, 2024

mdepinet added 2 commits November 7, 2024 22:04

Get remaining tool_use tests passing

4db5b39

Signed-off-by: Mike Depinet <[email protected]>

Merge remote-tracking branch 'origin/main' into mike/pythonic-tool-calls

4044bdf

mergify bot removed the needs-rebase label Nov 7, 2024

mdepinet added 2 commits November 7, 2024 23:34

Add ToolACE template and tool_use test entry

644f8be

Signed-off-by: Mike Depinet <[email protected]>

update docs

29d62ac

Signed-off-by: Mike Depinet <[email protected]>

K-Mistele reviewed Nov 8, 2024

View reviewed changes

docs/source/serving/openai_compatible_server.md Show resolved Hide resolved

Warn about Llama3.2 models, add TODO for future work

a579569

Signed-off-by: Mike Depinet <[email protected]>

mdepinet requested a review from K-Mistele November 8, 2024 17:33

DarkLight1337 reviewed Nov 9, 2024

View reviewed changes

docs/source/serving/openai_compatible_server.md Outdated Show resolved Hide resolved

DarkLight1337 reviewed Nov 9, 2024

View reviewed changes

vllm/entrypoints/openai/serving_chat.py Outdated Show resolved Hide resolved

K-Mistele mentioned this pull request Nov 11, 2024

[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use #10164

Merged

PR comments

0eaf0a6

Signed-off-by: Mike Depinet <[email protected]>

mdepinet requested a review from DarkLight1337 November 13, 2024 19:54

Alter warning block based on other "note" block in this file

29a9704

Signed-off-by: Mike Depinet <[email protected]>

DarkLight1337 approved these changes Nov 14, 2024

View reviewed changes

DarkLight1337 enabled auto-merge (squash) November 14, 2024 02:59

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 14, 2024

DarkLight1337 merged commit f67ce05 into vllm-project:main Nov 14, 2024
60 checks passed

omer-dayan pushed a commit to omer-dayan/vllm that referenced this pull request Nov 14, 2024

[Frontend] Pythonic tool parser (vllm-project#9859)

1113db9

Signed-off-by: Mike Depinet <[email protected]> Signed-off-by: OmerD <[email protected]>

sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024

[Frontend] Pythonic tool parser (vllm-project#9859)

e7b7b21

Signed-off-by: Mike Depinet <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>

mdepinet deleted the mike/pythonic-tool-calls branch November 14, 2024 17:35

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

[Frontend] Pythonic tool parser (vllm-project#9859)

550c2b0

Signed-off-by: Mike Depinet <[email protected]>

mfournioux pushed a commit to mfournioux/vllm that referenced this pull request Nov 20, 2024

[Frontend] Pythonic tool parser (vllm-project#9859)

cc0de46

Signed-off-by: Mike Depinet <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>

rickyyx pushed a commit to rickyyx/vllm that referenced this pull request Nov 20, 2024

[Frontend] Pythonic tool parser (vllm-project#9859)

7cebc81

Signed-off-by: Mike Depinet <[email protected]> Signed-off-by: rickyx <[email protected]>

tlrmchlsmth pushed a commit to neuralmagic/vllm that referenced this pull request Nov 23, 2024

[Frontend] Pythonic tool parser (vllm-project#9859)

77d1ab0

Signed-off-by: Mike Depinet <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[Frontend] Pythonic tool parser (vllm-project#9859)

e886c23

Signed-off-by: Mike Depinet <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend] Pythonic tool parser #9859

[Frontend] Pythonic tool parser #9859

mdepinet commented Oct 30, 2024 •

edited by DarkLight1337

Loading

github-actions bot commented Oct 30, 2024

DarkLight1337 commented Oct 31, 2024

K-Mistele commented Oct 31, 2024

K-Mistele commented Nov 1, 2024

mdepinet commented Nov 1, 2024

K-Mistele commented Nov 7, 2024

K-Mistele commented Nov 7, 2024

K-Mistele commented Nov 7, 2024

K-Mistele commented Nov 7, 2024

mergify bot commented Nov 7, 2024

mdepinet commented Nov 8, 2024

K-Mistele commented Nov 8, 2024

K-Mistele commented Nov 8, 2024

DarkLight1337 commented Nov 8, 2024

SinanAkkoyun commented Nov 8, 2024

K-Mistele commented Nov 11, 2024

DarkLight1337 left a comment

[Frontend] Pythonic tool parser #9859

[Frontend] Pythonic tool parser #9859

Conversation

mdepinet commented Oct 30, 2024 • edited by DarkLight1337 Loading

github-actions bot commented Oct 30, 2024

DarkLight1337 commented Oct 31, 2024

K-Mistele commented Oct 31, 2024

K-Mistele commented Nov 1, 2024

mdepinet commented Nov 1, 2024

K-Mistele commented Nov 7, 2024

K-Mistele commented Nov 7, 2024

K-Mistele commented Nov 7, 2024

K-Mistele commented Nov 7, 2024

mergify bot commented Nov 7, 2024

mdepinet commented Nov 8, 2024

K-Mistele commented Nov 8, 2024

K-Mistele commented Nov 8, 2024

DarkLight1337 commented Nov 8, 2024

SinanAkkoyun commented Nov 8, 2024

K-Mistele commented Nov 11, 2024

DarkLight1337 left a comment

Choose a reason for hiding this comment

mdepinet commented Oct 30, 2024 •

edited by DarkLight1337

Loading