OpenAI Tools / function calling #2488

FlorianJoncour · 2024-01-18T12:41:30Z

As expected after the refactoring of the OpenAI API, here come the function calls!
This follows the first PR: #2210.

The current implementation has been largely modified to use jinja templates, there is no more static text in the Python code.
The default template can be reassigned using the tools-template argument.

Among the recent comments, there was also troubles with tool_choice which was not fully handled during the request.
Now, if tool_choice is None the behavior is the same as "auto", if it is something else, we look if the function is present, and so the template encourage the model to call this function.

A last addition is the dev-mode parameter.
If it is activated, you can send a GET request on '/resetAPI'.
Thanks to the recent refactoring, the called function will reload the internal part of the API without reloading the vLLM engine.
This will also display a lot of warnings because this should absolutely not be in a public API !

The idea is to facilitate the development of the API and templates, as the API will also display the prompt at each generation after passing through the different templates (including the tokenizer) and the templates will be reloaded, allowing it to be modified without having to restart vLLM.

Maybe this will be controversial, I don't know.
If needed I can remove it, but I can assure you it has been useful for creating the tools template !

simon-mo · 2024-01-22T00:55:53Z

Thanks for making this PR. Few high level comments:

dev mode is totally fine. I would say rename it to privileged mode would be better. There are use cases of hot-reload these in production, as well as, in the future using it to load/unload models.
please fix the camel case naming in method name
overall i'm concerned about the robustness of the templating and complexity of this approach.
- can we replace tool definition and schema templating by appending user messages?
- can we replace the output capture and parsing with Outlines's guided generation?
  - In particular, the tool choice can be defined as a Choice in outlines
  - The JSON schema support is also native from outlines

FlorianJoncour · 2024-01-22T14:00:09Z

can we replace tool definition and schema templating by appending user messages?

I'm not sure if I understand. Do you mean that we should be able to define the template with POST in chat/completions or define an API entry (like privileged) to change the template globally?

It is technically possible, it's even simpler in the first case, but that means there's no template left, so the implementation is simpler, the user has complete freedom over how to present the list of functions and instructions. However, we lose compatibility with the reference API.

In the second case, if we stay on a template system but sent from a POST command and not a file, it's possible. However, there's some Jinja code logic in the template since there's a loop. I need to find a way to make it simple with JSON.

can we replace the output capture and parsing with Outlines's guided generation?

I had also followed the discussion on #2105.
I am not familiar with guided generation but it's promising.
However, there is something that bothers me, which is that the model needs to signal when it needs to call a function and for that, it needs to generate a keyword first, which we capture.
To use outlines, we need to send the schema using SamplingParams.logits_processors
(https://github.com/outlines-dev/outlines/blob/main/outlines/serve/serve.py),
which means stopping the generation as soon as we capture !function_call, changing SamplingParams and then start a new generation.
It's not insurmountable but it deserves some thought to do it properly, especially since we also need to adapt the template system to integrate it all (with functions list but also instuctions).

FlorianJoncour · 2024-01-23T13:41:59Z

I got an idea
What do you think about a plugin system ?

It's a thing I think about since last week.
The first idea is to manage api keys.

If the server admin define a structure like this (maybe inherited from a pre-defined structure on vLLM tree):

import vllm.entrypoints.openai.plugin_system as vllm_plugin_system

class MyPluginForAPIKeys:
    def __init__(self):
        pass
    
    def exec(self, args: {}):
        return self.validate_key(args["api_key"])

    def validate_key(self, key: str) -> bool:
        return True

def PluginsInit():  # Called by vLLM
    vllm_plugin_system.register(plugin=MyPluginForAPIKeys(), context=OpenAIPlugins::API_KEYS)

And so on the server side we can hot load this script (defined by a given argument) and the server admin can manage api keys, templates, and more...
It can also be a basis for future development since assistants and code executors (currently in beta on OpenAI) are important feature but leaves the vllm framework.

And so on the first step, we can use a system like this to manage functions calling templates.
The template become a Python script.

EDIT: Something like this could be used to reset SamplingParams during inference and so permit guided generation with outlines. I don't know for now, I've to learn more about vLLM internals.

simon-mo · 2024-01-23T23:34:59Z

Hi @FlorianJoncour,

Regarding privileged, I just mean changing the name from dev mode to privileged so it's more general. I do see the future of using the same flag for other purpose.

Regarding the template, please give me a day or two to think through this. As you clearly explained, this is not as simple as simple constrained output. This is the case of multi-turned conversation with different kinds of output.

Regarding plugin, the API Key use case is simpler and I just merged #1106 using FastAPI middleware. But we are open to other proposal that's well formed and designed!

AguirreNicolas · 2024-01-24T06:33:20Z

Hi @FlorianJoncour,

Regarding privileged, I just mean changing the name from dev mode to privileged so it's more general. I do see the future of using the same flag for other purpose.

Regarding the template, please give me a day or two to think through this. As you clearly explained, this is not as simple as simple constrained output. This is the case of multi-turned conversation with different kinds of output.

Regarding plugin, the API Key use case is simpler and I just merged #1106 using FastAPI middleware. But we are open to other proposal that's well formed and designed!

@simon-mo
I propose a PR in @FlorianJoncour fork to expand some use cases. Do you thinks that this is a valueble feature to consider to merge in this PR?

esmeetu · 2024-01-25T00:57:10Z

Hi, @FlorianJoncour It would be better to generate specific prompt that the model already supports. Like InternLM/InternLM#634
That model has a action_start and action_end indicator. Does the tool function template support that?
And there are also many other models has some specific prompt template, and we might apply them like chat-template.

FlorianJoncour · 2024-01-25T17:03:26Z

Regarding privileged, I just mean changing the name from dev mode to privileged so it's more general. I do see the future of using the same flag for other purpose.

The problem of not systematically using LLMs for translation can lead to this kind of misunderstanding: D(modifié)

@esmeetu You can already do that by modifying the template in CONTEXT == CALLS_NOTIF and CONTEXT == TOOL_RESPONSE.
This allows you to modify the way functions responses are presented to the LLM.

There are Jinja comments in the template, but Jinja scripts are difficult to read!
This is the main reason why I'm not against the idea of changing the template system.

esmeetu · 2024-02-01T04:38:54Z

vllm/entrypoints/openai/api_server.py

+                        type=str,
+                        default=None,
+                        help="The file path to alternative tools template")
+    parser.add_argument("--enable-api-tools",


Can we remove this args to enable by default?

esmeetu · 2024-02-01T04:41:11Z

vllm/entrypoints/openai/api_server.py

@@ -89,6 +101,12 @@ def parse_args():
                        type=str,
                        default=None,
                        help="The file path to the SSL cert file")
+    parser.add_argument(
+        "--dev-mode",


Maybe you can submit another PR for these features unrelated to function call.

esmeetu · 2024-02-01T04:46:16Z

vllm/entrypoints/openai/tools.py

+                request.messages = text_inject + request.messages
+            elif isinstance(request.messages,
+                            List) and len(request.messages) >= 1:
+                request.messages[


I think this can be replaced by adding another system message.
request.messages.insert(0,ChatCompletionSystemMessage(role='system', content=text_inject))

I don't thinks this is a good idea. Some models does't have system role, plus, some others template (like llama), doesn't allow 2 system messages during the conversation (it will preserve the first system msg and ignore any other).

esmeetu · 2024-02-01T04:47:29Z

vllm/entrypoints/openai/tools.py

+            if len(v_call):
+                try:
+                    call_dict = json.loads(v_call)
+                    if "call" in call_dict:


Why need this check?

hmm probably a check in the early stage of development :p
Removed in the new version

esmeetu · 2024-02-01T04:49:38Z

vllm/entrypoints/openai/tools.py

+            self, call_id: int) -> Union[ChatCompletionMessageToolCall, None]:
+        if len(self.calls_list) and call_id < len(self.calls_list):
+            call = self.calls_list[call_id]
+            arguments = call["arguments"] if "arguments" in call else None


If the model return function with only parameters, this maybe fail. Can you check both arguments and parameters?

esmeetu · 2024-02-01T04:51:00Z

vllm/entrypoints/openai/tools.py

+            function_call = Function(name=call["call"],
+                                     arguments=json.dumps(arguments)
+                                     if arguments is not None else "")
+            return ChatCompletionMessageToolCall(id="call_" + call["call"] +


call['name'] better? 'name' is more general.

esmeetu · 2024-02-01T05:03:52Z

Hi, @FlorianJoncour
I left some comments here. After i make those changes, i can use function call! Thanks.

xiechengmude · 2024-02-04T06:14:24Z

how is it going here? Very appreciated.

br3no · 2024-02-05T13:06:33Z

This is great work @FlorianJoncour!

In line with this comment from @simon-mo, I believe this would profit from a consolidation of efforts of guided generation and function calling, as both form a kind of duality. Injecting the tooling prompts is one part, guaranteeing that the generated output conforms with the requirements of the functions is the other.

Regarding the mechanics of how to integrate the detection of a function call and your comment:

To use outlines, we need to send the schema using SamplingParams.logits_processors
(https://github.com/outlines-dev/outlines/blob/main/outlines/serve/serve.py),
which means stopping the generation as soon as we capture !function_call, changing SamplingParams and then start a new generation.

Maybe there is an easier intermediary step? In conformance with https://platform.openai.com/docs/api-reference/chat/create#chat-create-tool_choice, if tool_choice is none, outlines is not added to the mix. If a function is passed, then outlines is added to the mix. For auto mode to be supported, there would need to be some sort of token detection, as you correctly point out. But maybe solving this problem can be postponed to a later issue?

simon-mo · 2024-02-22T22:41:26Z

Structured generation is almost there!! #2819

simon-mo · 2024-03-01T19:15:08Z

Structured generation is merged. You can use choice, regex, json_schema to get rid of most of the templating and parsing.

FlorianJoncour · 2024-03-03T22:41:31Z

Hello everyone,

I have been following the development of #2819, but I was working on my own project.

I will look into adapting this PR in the coming week to use guided generation and the various comments given above.
As @br3no suggested, I will look into using the current implementation only in the case where tool_choice equals "auto" and using guided generation if the user defines a function name.

RonanKMcGovern · 2024-03-05T16:47:34Z

@FlorianJoncour is this being moved to a new pr then?

Just one thought for when/if it goes ahead. To the extent it can be made very clear in the docs exactly how the functions/tools are being formatted - specifically, what helper text is being added - that will help model developers ensure support.

FlorianJoncour · 2024-03-06T15:34:30Z

#3237

@RonanKMcGovern: A documentation is probably needed. I can make a rough draft, but since English is not my native language (I write in French and use an LLM for translation), it will probably requires a re-reading.

FlorianJoncour mentioned this pull request Jan 18, 2024

OpenAI API refactoring + Functions calling #2210

Closed

simon-mo mentioned this pull request Jan 18, 2024

Add Grammars #2105

Closed

11 tasks

simon-mo self-assigned this Jan 21, 2024

FlorianJoncour mentioned this pull request Jan 22, 2024

[Server] Support openai prefix cache #2515

Closed

esmeetu reviewed Feb 1, 2024

View reviewed changes

felixzhu555 mentioned this pull request Feb 13, 2024

Add guided decoding for OpenAI API server #2819

Merged

FlorianJoncour closed this Mar 5, 2024

FlorianJoncour force-pushed the main branch from 2397eaf to 05af6da Compare March 5, 2024 16:41

FlorianJoncour mentioned this pull request Mar 6, 2024

OpenAI Tools / function calling v2 #3237

Closed

br3no mentioned this pull request May 22, 2024

Support tools and tool_choice parameter in OpenAI compatible service #1869

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI Tools / function calling #2488

OpenAI Tools / function calling #2488

FlorianJoncour commented Jan 18, 2024

simon-mo commented Jan 22, 2024

FlorianJoncour commented Jan 22, 2024 •

edited

Loading

FlorianJoncour commented Jan 23, 2024 •

edited

Loading

simon-mo commented Jan 23, 2024

AguirreNicolas commented Jan 24, 2024 •

edited

Loading

esmeetu commented Jan 25, 2024 •

edited

Loading

FlorianJoncour commented Jan 25, 2024 •

edited

Loading

esmeetu Feb 1, 2024

esmeetu Feb 1, 2024

esmeetu Feb 1, 2024

AguirreNicolas Feb 2, 2024

esmeetu Feb 1, 2024

FlorianJoncour Mar 6, 2024

esmeetu Feb 1, 2024

FlorianJoncour Mar 6, 2024

esmeetu Feb 1, 2024

FlorianJoncour Mar 6, 2024

esmeetu commented Feb 1, 2024

xiechengmude commented Feb 4, 2024

br3no commented Feb 5, 2024 •

edited

Loading

simon-mo commented Feb 22, 2024

simon-mo commented Mar 1, 2024

FlorianJoncour commented Mar 3, 2024

RonanKMcGovern commented Mar 5, 2024

FlorianJoncour commented Mar 6, 2024

OpenAI Tools / function calling #2488

OpenAI Tools / function calling #2488

Conversation

FlorianJoncour commented Jan 18, 2024

simon-mo commented Jan 22, 2024

FlorianJoncour commented Jan 22, 2024 • edited Loading

FlorianJoncour commented Jan 23, 2024 • edited Loading

simon-mo commented Jan 23, 2024

AguirreNicolas commented Jan 24, 2024 • edited Loading

esmeetu commented Jan 25, 2024 • edited Loading

FlorianJoncour commented Jan 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

esmeetu commented Feb 1, 2024

xiechengmude commented Feb 4, 2024

br3no commented Feb 5, 2024 • edited Loading

simon-mo commented Feb 22, 2024

simon-mo commented Mar 1, 2024

FlorianJoncour commented Mar 3, 2024

RonanKMcGovern commented Mar 5, 2024

FlorianJoncour commented Mar 6, 2024

FlorianJoncour commented Jan 22, 2024 •

edited

Loading

FlorianJoncour commented Jan 23, 2024 •

edited

Loading

AguirreNicolas commented Jan 24, 2024 •

edited

Loading

esmeetu commented Jan 25, 2024 •

edited

Loading

FlorianJoncour commented Jan 25, 2024 •

edited

Loading

br3no commented Feb 5, 2024 •

edited

Loading