-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenAI Tools / function calling #2488
Conversation
Thanks for making this PR. Few high level comments:
|
I'm not sure if I understand. Do you mean that we should be able to define the template with It is technically possible, it's even simpler in the first case, but that means there's no template left, so the implementation is simpler, the user has complete freedom over how to present the list of functions and instructions. However, we lose compatibility with the reference API. In the second case, if we stay on a template system but sent from a
I had also followed the discussion on #2105. |
I got an idea It's a thing I think about since last week. If the server admin define a structure like this (maybe inherited from a pre-defined structure on vLLM tree): import vllm.entrypoints.openai.plugin_system as vllm_plugin_system
class MyPluginForAPIKeys:
def __init__(self):
pass
def exec(self, args: {}):
return self.validate_key(args["api_key"])
def validate_key(self, key: str) -> bool:
return True
def PluginsInit(): # Called by vLLM
vllm_plugin_system.register(plugin=MyPluginForAPIKeys(), context=OpenAIPlugins::API_KEYS) And so on the server side we can hot load this script (defined by a given argument) and the server admin can manage api keys, templates, and more... And so on the first step, we can use a system like this to manage functions calling templates. EDIT: Something like this could be used to reset SamplingParams during inference and so permit guided generation with outlines. I don't know for now, I've to learn more about vLLM internals. |
Hi @FlorianJoncour, Regarding Regarding the template, please give me a day or two to think through this. As you clearly explained, this is not as simple as simple constrained output. This is the case of multi-turned conversation with different kinds of output. Regarding plugin, the API Key use case is simpler and I just merged #1106 using FastAPI middleware. But we are open to other proposal that's well formed and designed! |
@simon-mo |
Hi, @FlorianJoncour It would be better to generate specific prompt that the model already supports. Like InternLM/InternLM#634 |
The problem of not systematically using LLMs for translation can lead to this kind of misunderstanding: D(modifié) @esmeetu You can already do that by modifying the template in There are Jinja comments in the template, but Jinja scripts are difficult to read! |
type=str, | ||
default=None, | ||
help="The file path to alternative tools template") | ||
parser.add_argument("--enable-api-tools", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this args to enable by default?
@@ -89,6 +101,12 @@ def parse_args(): | |||
type=str, | |||
default=None, | |||
help="The file path to the SSL cert file") | |||
parser.add_argument( | |||
"--dev-mode", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can submit another PR for these features unrelated to function call.
vllm/entrypoints/openai/tools.py
Outdated
request.messages = text_inject + request.messages | ||
elif isinstance(request.messages, | ||
List) and len(request.messages) >= 1: | ||
request.messages[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be replaced by adding another system
message.
request.messages.insert(0,ChatCompletionSystemMessage(role='system', content=text_inject))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't thinks this is a good idea. Some models does't have system
role, plus, some others template (like llama), doesn't allow 2 system messages during the conversation (it will preserve the first system msg and ignore any other).
vllm/entrypoints/openai/tools.py
Outdated
if len(v_call): | ||
try: | ||
call_dict = json.loads(v_call) | ||
if "call" in call_dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why need this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm probably a check in the early stage of development :p
Removed in the new version
vllm/entrypoints/openai/tools.py
Outdated
self, call_id: int) -> Union[ChatCompletionMessageToolCall, None]: | ||
if len(self.calls_list) and call_id < len(self.calls_list): | ||
call = self.calls_list[call_id] | ||
arguments = call["arguments"] if "arguments" in call else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the model return function with only parameters
, this maybe fail. Can you check both arguments
and parameters
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
vllm/entrypoints/openai/tools.py
Outdated
function_call = Function(name=call["call"], | ||
arguments=json.dumps(arguments) | ||
if arguments is not None else "") | ||
return ChatCompletionMessageToolCall(id="call_" + call["call"] + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call['name']
better? 'name' is more general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Hi, @FlorianJoncour |
how is it going here? Very appreciated. |
This is great work @FlorianJoncour! In line with this comment from @simon-mo, I believe this would profit from a consolidation of efforts of guided generation and function calling, as both form a kind of duality. Injecting the tooling prompts is one part, guaranteeing that the generated output conforms with the requirements of the functions is the other. Regarding the mechanics of how to integrate the detection of a function call and your comment:
Maybe there is an easier intermediary step? In conformance with https://platform.openai.com/docs/api-reference/chat/create#chat-create-tool_choice, if |
Structured generation is almost there!! #2819 |
Structured generation is merged. You can use |
Hello everyone, I have been following the development of #2819, but I was working on my own project. I will look into adapting this PR in the coming week to use guided generation and the various comments given above. |
@FlorianJoncour is this being moved to a new pr then? Just one thought for when/if it goes ahead. To the extent it can be made very clear in the docs exactly how the functions/tools are being formatted - specifically, what helper text is being added - that will help model developers ensure support. |
@RonanKMcGovern: A documentation is probably needed. I can make a rough draft, but since English is not my native language (I write in French and use an LLM for translation), it will probably requires a re-reading. |
As expected after the refactoring of the OpenAI API, here come the function calls!
This follows the first PR: #2210.
The current implementation has been largely modified to use jinja templates, there is no more static text in the Python code.
The default template can be reassigned using the
tools-template
argument.Among the recent comments, there was also troubles with tool_choice which was not fully handled during the request.
Now,
if tool_choice is None
the behavior is the same as "auto", if it is something else, we look if the function is present, and so the template encourage the model to call this function.A last addition is the
dev-mode
parameter.If it is activated, you can send a GET request on '/resetAPI'.
Thanks to the recent refactoring, the called function will reload the internal part of the API without reloading the vLLM engine.
This will also display a lot of warnings because this should absolutely not be in a public API !
The idea is to facilitate the development of the API and templates, as the API will also display the prompt at each generation after passing through the different templates (including the tokenizer) and the templates will be reloaded, allowing it to be modified without having to restart vLLM.
Maybe this will be controversial, I don't know.
If needed I can remove it, but I can assure you it has been useful for creating the tools template !