-
Notifications
You must be signed in to change notification settings - Fork 12.6k
gpt-oss: implement harmony parsing #15181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thanks. It finally made it much easier to use tools in Cherry Studio. And it generates thinking boxes properly. |
With the PR: It's better, easily more usable, but there might be some issues around tool calling still. |
@dagbs try setting function calling to |
d65e556
to
981886f
Compare
I tried this PR yesterday and compared it to #15158 (+ my own fixes on top of that PR) and there was a couple of issues with this PR (that I was gonna share this morning), but since da67163 was pushed, it seems to finally work better than that PR. In my (albeit limited) testing, seems tool calling and it's formatting is working a lot better. Thanks a ton for this patch @aldehir! All the unit tests pass as well, compared to the other PR, and code organization at a glance seems better too, but granted I'm not cpp expert, just an generalist. |
Hmm, seems to still be breaking sometimes, tried to understand why but to no avail. Most of the time, it works perfectly fine, but seems some edge-case breaks it. Running da67163 right now. If I repeatably use the same weather example maybe 10 times, I end up getting a badly parsed (on llama.cpp's side) maybe once. Good run looks like this: ChatCompletionResponse {
choices: [
Choice {
message: ResponseMessage {
content: Some(
"Here are the current conditions for the three cities, sorted by temperature (highest\u{202f}→\u{202f}lowest):\n\n- **Barcelona**: ☀\u{fe0f}\u{202f}+25\u{202f}°C \n- **Lima**: ⛅\u{fe0f}\u{202f}+16\u{202f}°C \n- **Stockholm**: ☀\u{fe0f}\u{202f}+13\u{202f}°C \n\n*(Temperatures are taken from the latest weather data at the time of the query.)*",
),
reasoning_content: Some(
"The user asks: \"What is the current weather in Barcelona, Stockholm, and Lima? And also, display them in a list sorted by their temperatures, highest first.\"\n\nWe have fetched weather for each location via the get_weather function. The function returns a JSON string with \"result\": \"Barcelona: ☀\u{fe0f} +25°C\\n\". Similarly for Stockholm: \"Stockholm: ☀\u{fe0f} +13°C\\n\". Lima: \"Lima: ⛅\u{fe0f} +16°C\\n\". We need to parse these results, extract the temperature values, sort them descending, and display them in a list.\n\nWe need to produce a final answer that includes the weather for each location sorted by temperature highest first. The user wants a list sorted by temperature, highest first. So we need to sort: Barcelona +25°C, Lima +16°C, Stockholm +13°C.\n\nThus the sorted list: Barcelona: ☀\u{fe0f} +25°C, Lima: ⛅\u{fe0f} +16°C, Stockholm: ☀\u{fe0f} +13°C.\n\nWe should present them as a list, maybe bullet points.\n\nWe need to ensure we include the weather icons and temperature values as given.\n\nThus answer: \n\n- Barcelona: ☀\u{fe0f} +25°C\n- Lima: ⛅\u{fe0f} +16°C\n- Stockholm: ☀\u{fe0f} +13°C\n\nWe could also include the original strings.\n\nThus final answer: a list sorted by temperature highest first.\n\nWe should also note that the data is from the function calls.\n\nThus answer: \"Here are the current weather conditions for the three cities, sorted by temperature (highest first): ...\"\n\nWe should also mention that the temperatures are approximate and may change.\n\nThus final answer.",
),
tool_calls: [],
},
},
],
}
Meanwhile, a bad runs ends up with: ChatCompletionResponse {
choices: [
Choice {
message: ResponseMessage {
content: Some(
" to=functions.get_weather\u{a0}\u{200b}\u{200b}\u{a0}\u{a0}\n\n\n\n",
),
reasoning_content: None,
tool_calls: [],
},
},
],
} Full logs from bad run:
Seems to happen more often when |
@victorb maybe use temperature= 0 and/or top-k 1? If inference is the issue, making it deterministic would fix it. |
Running with these inference parameters for example: {
temperature: 0.0,
top_p: 1.0,
min_p: 0.0,
top_k: 0,
samplers: [
"top_k",
"top_p",
"min_p",
"temperature",
],
} Seems to correctly give me deterministic responses, which once I get one good response, they always work well, but the ones that break, always break, so I guess useful for testing at the very least. Here's one example of broken parsing I'm currently getting, even with ChatCompletionResponse {
choices: [
Choice {
message: ResponseMessage {
content: Some(
" to=function\u{a0}\u{a0}...",
),
reasoning_content: None,
tool_calls: [],
},
},
],
}
Tried setting |
@victorb thank you for that extensive testing. I can't seem to reproduce this on
That will help me better understand the problem. It appears the model is emitting unicode space characters, but I wasn't aware the |
I managed to get Looks like I missed a scenario where the model outputs the recipient (
I have yet to see the I updated the parsing and grammar rule to handle this. It should at least parse the tool calls now. I found performance degrades by the third call. I get queries to "Lima??", "Lima?", or some variation with garbage at the end. However, if I pass Give cf9a0d6 a shot. |
This is my attempt at implementing a harmony parser for gpt-oss.
Implementation
auto
andnone
are supported. Whennone
,<|channel|>analysis<|message|>{reasoning content}<|end|>
is added to the content.parse_tool_calls == false
, tool calls are added to the content verbatim--which aligns with other implementations.Remaining Work
reasoning_content
. However, none of the clients I tested send it. A simple workaround is to usereasoning_format = none
, or add the reasoning to the content in tool calls.