Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI Tools / function calling #2488

Closed
wants to merge 0 commits into from

Conversation

FlorianJoncour
Copy link
Contributor

As expected after the refactoring of the OpenAI API, here come the function calls!
This follows the first PR: #2210.

The current implementation has been largely modified to use jinja templates, there is no more static text in the Python code.
The default template can be reassigned using the tools-template argument.

Among the recent comments, there was also troubles with tool_choice which was not fully handled during the request.
Now, if tool_choice is None the behavior is the same as "auto", if it is something else, we look if the function is present, and so the template encourage the model to call this function.

A last addition is the dev-mode parameter.
If it is activated, you can send a GET request on '/resetAPI'.
Thanks to the recent refactoring, the called function will reload the internal part of the API without reloading the vLLM engine.
This will also display a lot of warnings because this should absolutely not be in a public API !

The idea is to facilitate the development of the API and templates, as the API will also display the prompt at each generation after passing through the different templates (including the tokenizer) and the templates will be reloaded, allowing it to be modified without having to restart vLLM.

Maybe this will be controversial, I don't know.
If needed I can remove it, but I can assure you it has been useful for creating the tools template !

@simon-mo
Copy link
Collaborator

Thanks for making this PR. Few high level comments:

  • dev mode is totally fine. I would say rename it to privileged mode would be better. There are use cases of hot-reload these in production, as well as, in the future using it to load/unload models.
  • please fix the camel case naming in method name
  • overall i'm concerned about the robustness of the templating and complexity of this approach.
    • can we replace tool definition and schema templating by appending user messages?
    • can we replace the output capture and parsing with Outlines's guided generation?
      • In particular, the tool choice can be defined as a Choice in outlines
      • The JSON schema support is also native from outlines

@FlorianJoncour
Copy link
Contributor Author

FlorianJoncour commented Jan 22, 2024

can we replace tool definition and schema templating by appending user messages?

I'm not sure if I understand. Do you mean that we should be able to define the template with POST in chat/completions or define an API entry (like privileged) to change the template globally?

It is technically possible, it's even simpler in the first case, but that means there's no template left, so the implementation is simpler, the user has complete freedom over how to present the list of functions and instructions. However, we lose compatibility with the reference API.

In the second case, if we stay on a template system but sent from a POST command and not a file, it's possible. However, there's some Jinja code logic in the template since there's a loop. I need to find a way to make it simple with JSON.

can we replace the output capture and parsing with Outlines's guided generation?

I had also followed the discussion on #2105.
I am not familiar with guided generation but it's promising.
However, there is something that bothers me, which is that the model needs to signal when it needs to call a function and for that, it needs to generate a keyword first, which we capture.
To use outlines, we need to send the schema using SamplingParams.logits_processors
(https://github.com/outlines-dev/outlines/blob/main/outlines/serve/serve.py),
which means stopping the generation as soon as we capture !function_call, changing SamplingParams and then start a new generation.
It's not insurmountable but it deserves some thought to do it properly, especially since we also need to adapt the template system to integrate it all (with functions list but also instuctions).

@FlorianJoncour
Copy link
Contributor Author

FlorianJoncour commented Jan 23, 2024

I got an idea
What do you think about a plugin system ?

It's a thing I think about since last week.
The first idea is to manage api keys.

If the server admin define a structure like this (maybe inherited from a pre-defined structure on vLLM tree):

import vllm.entrypoints.openai.plugin_system as vllm_plugin_system

class MyPluginForAPIKeys:
    def __init__(self):
        pass
    
    def exec(self, args: {}):
        return self.validate_key(args["api_key"])

    def validate_key(self, key: str) -> bool:
        return True

def PluginsInit():  # Called by vLLM
    vllm_plugin_system.register(plugin=MyPluginForAPIKeys(), context=OpenAIPlugins::API_KEYS)

And so on the server side we can hot load this script (defined by a given argument) and the server admin can manage api keys, templates, and more...
It can also be a basis for future development since assistants and code executors (currently in beta on OpenAI) are important feature but leaves the vllm framework.

And so on the first step, we can use a system like this to manage functions calling templates.
The template become a Python script.

EDIT: Something like this could be used to reset SamplingParams during inference and so permit guided generation with outlines. I don't know for now, I've to learn more about vLLM internals.

@simon-mo
Copy link
Collaborator

Hi @FlorianJoncour,

Regarding privileged, I just mean changing the name from dev mode to privileged so it's more general. I do see the future of using the same flag for other purpose.

Regarding the template, please give me a day or two to think through this. As you clearly explained, this is not as simple as simple constrained output. This is the case of multi-turned conversation with different kinds of output.

Regarding plugin, the API Key use case is simpler and I just merged #1106 using FastAPI middleware. But we are open to other proposal that's well formed and designed!

@AguirreNicolas
Copy link
Contributor

AguirreNicolas commented Jan 24, 2024

Hi @FlorianJoncour,

Regarding privileged, I just mean changing the name from dev mode to privileged so it's more general. I do see the future of using the same flag for other purpose.

Regarding the template, please give me a day or two to think through this. As you clearly explained, this is not as simple as simple constrained output. This is the case of multi-turned conversation with different kinds of output.

Regarding plugin, the API Key use case is simpler and I just merged #1106 using FastAPI middleware. But we are open to other proposal that's well formed and designed!

@simon-mo
I propose a PR in @FlorianJoncour fork to expand some use cases. Do you thinks that this is a valueble feature to consider to merge in this PR?

@esmeetu
Copy link
Collaborator

esmeetu commented Jan 25, 2024

Hi, @FlorianJoncour It would be better to generate specific prompt that the model already supports. Like InternLM/InternLM#634
That model has a action_start and action_end indicator. Does the tool function template support that?
And there are also many other models has some specific prompt template, and we might apply them like chat-template.

@FlorianJoncour
Copy link
Contributor Author

FlorianJoncour commented Jan 25, 2024

Regarding privileged, I just mean changing the name from dev mode to privileged so it's more general. I do see the future of using the same flag for other purpose.

The problem of not systematically using LLMs for translation can lead to this kind of misunderstanding: D(modifié)

@esmeetu You can already do that by modifying the template in CONTEXT == CALLS_NOTIF and CONTEXT == TOOL_RESPONSE.
This allows you to modify the way functions responses are presented to the LLM.

There are Jinja comments in the template, but Jinja scripts are difficult to read!
This is the main reason why I'm not against the idea of changing the template system.

type=str,
default=None,
help="The file path to alternative tools template")
parser.add_argument("--enable-api-tools",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this args to enable by default?

@@ -89,6 +101,12 @@ def parse_args():
type=str,
default=None,
help="The file path to the SSL cert file")
parser.add_argument(
"--dev-mode",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can submit another PR for these features unrelated to function call.

request.messages = text_inject + request.messages
elif isinstance(request.messages,
List) and len(request.messages) >= 1:
request.messages[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be replaced by adding another system message.
request.messages.insert(0,ChatCompletionSystemMessage(role='system', content=text_inject))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't thinks this is a good idea. Some models does't have system role, plus, some others template (like llama), doesn't allow 2 system messages during the conversation (it will preserve the first system msg and ignore any other).

if len(v_call):
try:
call_dict = json.loads(v_call)
if "call" in call_dict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm probably a check in the early stage of development :p
Removed in the new version

self, call_id: int) -> Union[ChatCompletionMessageToolCall, None]:
if len(self.calls_list) and call_id < len(self.calls_list):
call = self.calls_list[call_id]
arguments = call["arguments"] if "arguments" in call else None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the model return function with only parameters, this maybe fail. Can you check both arguments and parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

function_call = Function(name=call["call"],
arguments=json.dumps(arguments)
if arguments is not None else "")
return ChatCompletionMessageToolCall(id="call_" + call["call"] +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call['name'] better? 'name' is more general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@esmeetu
Copy link
Collaborator

esmeetu commented Feb 1, 2024

Hi, @FlorianJoncour
I left some comments here. After i make those changes, i can use function call! Thanks.

@xiechengmude
Copy link

how is it going here? Very appreciated.

@br3no
Copy link
Contributor

br3no commented Feb 5, 2024

This is great work @FlorianJoncour!

In line with this comment from @simon-mo, I believe this would profit from a consolidation of efforts of guided generation and function calling, as both form a kind of duality. Injecting the tooling prompts is one part, guaranteeing that the generated output conforms with the requirements of the functions is the other.

Regarding the mechanics of how to integrate the detection of a function call and your comment:

To use outlines, we need to send the schema using SamplingParams.logits_processors
(https://github.com/outlines-dev/outlines/blob/main/outlines/serve/serve.py),
which means stopping the generation as soon as we capture !function_call, changing SamplingParams and then start a new generation.

Maybe there is an easier intermediary step? In conformance with https://platform.openai.com/docs/api-reference/chat/create#chat-create-tool_choice, if tool_choice is none, outlines is not added to the mix. If a function is passed, then outlines is added to the mix. For auto mode to be supported, there would need to be some sort of token detection, as you correctly point out. But maybe solving this problem can be postponed to a later issue?

@simon-mo
Copy link
Collaborator

Structured generation is almost there!! #2819

@simon-mo
Copy link
Collaborator

simon-mo commented Mar 1, 2024

Structured generation is merged. You can use choice, regex, json_schema to get rid of most of the templating and parsing.

@FlorianJoncour
Copy link
Contributor Author

Hello everyone,

I have been following the development of #2819, but I was working on my own project.

I will look into adapting this PR in the coming week to use guided generation and the various comments given above.
As @br3no suggested, I will look into using the current implementation only in the case where tool_choice equals "auto" and using guided generation if the user defines a function name.

@RonanKMcGovern
Copy link
Contributor

@FlorianJoncour is this being moved to a new pr then?

Just one thought for when/if it goes ahead. To the extent it can be made very clear in the docs exactly how the functions/tools are being formatted - specifically, what helper text is being added - that will help model developers ensure support.

@FlorianJoncour
Copy link
Contributor Author

#3237

@RonanKMcGovern: A documentation is probably needed. I can make a rough draft, but since English is not my native language (I write in French and use an LLM for translation), it will probably requires a re-reading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants