gpt-oss: implement harmony parsing #15181

aldehir · 2025-08-08T18:51:44Z

This is my attempt at implementing a harmony parser for gpt-oss.

Implementation

Reasoning format support - both auto and none are supported. When none, <|channel|>analysis<|message|>{reasoning content}<|end|> is added to the content.
Tool parsing - tool parsing and grammar implemented. If parse_tool_calls == false, tool calls are added to the content verbatim--which aligns with other implementations.
Commentary preamble - the harmony format allows for a preamble in the commentary channel. If present, it is added to the content.
Tests added - perhaps too many test cases. I wanted to ensure proper parsing of partial messages.

Remaining Work

The harmony format specifies that reasoning content from the assistant's last tool call should be included in the next prompt. This implementation assumes it comes from the client in reasoning_content. However, none of the clients I tested send it. A simple workaround is to use reasoning_format = none, or add the reasoning to the content in tool calls.

abc-nix · 2025-08-08T20:30:35Z

Thanks. It finally made it much easier to use tools in Cherry Studio. And it generates thinking boxes properly.

dagbs · 2025-08-08T22:39:40Z

Without the PR:

With the PR:
using gpt-oss-20b:f16 from unsloth with the updated gguf

It's better, easily more usable, but there might be some issues around tool calling still.

aldehir · 2025-08-09T00:11:37Z

@dagbs try setting function calling to native in open-webui

victorb · 2025-08-09T07:37:18Z

I tried this PR yesterday and compared it to #15158 (+ my own fixes on top of that PR) and there was a couple of issues with this PR (that I was gonna share this morning), but since da67163 was pushed, it seems to finally work better than that PR. In my (albeit limited) testing, seems tool calling and it's formatting is working a lot better. Thanks a ton for this patch @aldehir!

All the unit tests pass as well, compared to the other PR, and code organization at a glance seems better too, but granted I'm not cpp expert, just an generalist.

victorb · 2025-08-09T09:05:42Z

Hmm, seems to still be breaking sometimes, tried to understand why but to no avail. Most of the time, it works perfectly fine, but seems some edge-case breaks it. Running da67163 right now.

If I repeatably use the same weather example maybe 10 times, I end up getting a badly parsed (on llama.cpp's side) maybe once.

Good run looks like this:

ChatCompletionResponse {
    choices: [
        Choice {
            message: ResponseMessage {
                content: Some(
                    "Here are the current conditions for the three cities, sorted by temperature (highest\u{202f}→\u{202f}lowest):\n\n- **Barcelona**: ☀\u{fe0f}\u{202f}+25\u{202f}°C  \n- **Lima**: ⛅\u{fe0f}\u{202f}+16\u{202f}°C  \n- **Stockholm**: ☀\u{fe0f}\u{202f}+13\u{202f}°C  \n\n*(Temperatures are taken from the latest weather data at the time of the query.)*",
                ),
                reasoning_content: Some(
                    "The user asks: \"What is the current weather in Barcelona, Stockholm, and Lima? And also, display them in a list sorted by their temperatures, highest first.\"\n\nWe have fetched weather for each location via the get_weather function. The function returns a JSON string with \"result\": \"Barcelona: ☀\u{fe0f}   +25°C\\n\". Similarly for Stockholm: \"Stockholm: ☀\u{fe0f}   +13°C\\n\". Lima: \"Lima: ⛅\u{fe0f}  +16°C\\n\". We need to parse these results, extract the temperature values, sort them descending, and display them in a list.\n\nWe need to produce a final answer that includes the weather for each location sorted by temperature highest first. The user wants a list sorted by temperature, highest first. So we need to sort: Barcelona +25°C, Lima +16°C, Stockholm +13°C.\n\nThus the sorted list: Barcelona: ☀\u{fe0f} +25°C, Lima: ⛅\u{fe0f} +16°C, Stockholm: ☀\u{fe0f} +13°C.\n\nWe should present them as a list, maybe bullet points.\n\nWe need to ensure we include the weather icons and temperature values as given.\n\nThus answer: \n\n- Barcelona: ☀\u{fe0f} +25°C\n- Lima: ⛅\u{fe0f} +16°C\n- Stockholm: ☀\u{fe0f} +13°C\n\nWe could also include the original strings.\n\nThus final answer: a list sorted by temperature highest first.\n\nWe should also note that the data is from the function calls.\n\nThus answer: \"Here are the current weather conditions for the three cities, sorted by temperature (highest first): ...\"\n\nWe should also mention that the temperatures are approximate and may change.\n\nThus final answer.",
                ),
                tool_calls: [],
            },
        },
    ],
}

sending:
[
    ChatMessage {
        role: "system",
        content: Some(
            "You are a helpful assistant. Help the user with whatever they need.\n",
        ),
        channel: None,
        recipient: None,
        tool_calls: None,
        tool_call_id: None,
    },
    ChatMessage {
        role: "user",
        content: Some(
            "What is the current weather in Barcelona, Stockholm, and Lima? And also, display them in a list sorted by their temperatures, highest first.",
        ),
        channel: None,
        recipient: None,
        tool_calls: None,
        tool_call_id: None,
    },
    ChatMessage {
        role: "assistant",
        content: Some(
            "",
        ),
        channel: Some(
            "commentary",
        ),
        recipient: None,
        tool_calls: Some(
            [
                ToolCall {
                    id: "ItCkpCeXs6jXspSwbFLidTHuATWM8MIj",
                    type: "function",
                    function: ToolCallFunction {
                        name: "get_weather",
                        arguments: "{\"location\":\"Barcelona\"}",
                    },
                },
            ],
        ),
        tool_call_id: None,
    },
    ChatMessage {
        role: "tool",
        content: Some(
            "{\"result\":\"Barcelona: ☀\u{fe0f}   +25°C\\n\"}",
        ),
        channel: Some(
            "commentary",
        ),
        recipient: None,
        tool_calls: None,
        tool_call_id: Some(
            "ItCkpCeXs6jXspSwbFLidTHuATWM8MIj",
        ),
    },
    ChatMessage {
        role: "assistant",
        content: Some(
            "",
        ),
        channel: Some(
            "commentary",
        ),
        recipient: None,
        tool_calls: Some(
            [
                ToolCall {
                    id: "d92fjsjS8L5xBMTxSCmSWcNyhgISwo4u",
                    type: "function",
                    function: ToolCallFunction {
                        name: "get_weather",
                        arguments: "{\"location\":\"Stockholm\"}",
                    },
                },
            ],
        ),
        tool_call_id: None,
    },
    ChatMessage {
        role: "tool",
        content: Some(
            "{\"result\":\"Stockholm: ☀\u{fe0f}   +13°C\\n\"}",
        ),
        channel: Some(
            "commentary",
        ),
        recipient: None,
        tool_calls: None,
        tool_call_id: Some(
            "d92fjsjS8L5xBMTxSCmSWcNyhgISwo4u",
        ),
    },
    ChatMessage {
        role: "assistant",
        content: Some(
            "",
        ),
        channel: Some(
            "commentary",
        ),
        recipient: None,
        tool_calls: Some(
            [
                ToolCall {
                    id: "0rIM7Xm598gzrRALjB4yMGZnuKRjOrSh",
                    type: "function",
                    function: ToolCallFunction {
                        name: "get_weather",
                        arguments: "{\"location\":\"Lima\"}",
                    },
                },
            ],
        ),
        tool_call_id: None,
    },
    ChatMessage {
        role: "tool",
        content: Some(
            "{\"result\":\"Lima: ⛅\u{fe0f}  +16°C\\n\"}",
        ),
        channel: Some(
            "commentary",
        ),
        recipient: None,
        tool_calls: None,
        tool_call_id: Some(
            "0rIM7Xm598gzrRALjB4yMGZnuKRjOrSh",
        ),
    },
]
[src/lib.rs:38:9] &val = Object {
    "choices": Array [
        Object {
            "finish_reason": String("stop"),
            "index": Number(0),
            "message": Object {
                "role": String("assistant"),
                "reasoning_content": String("The user asks: \"What is the current weather in Barcelona, Stockholm, and Lima? And also, display them in a list sorted by their temperatures, highest first.\"\n\nWe have fetched weather for each location via the get_weather function. The function returns a JSON string with \"result\": \"Barcelona: ☀\u{fe0f}   +25°C\\n\". Similarly for Stockholm: \"Stockholm: ☀\u{fe0f}   +13°C\\n\". Lima: \"Lima: ⛅\u{fe0f}  +16°C\\n\". We need to parse these results, extract the temperature values, sort them descending, and display them in a list.\n\nWe need to produce a final answer that includes the weather for each location sorted by temperature highest first. The user wants a list sorted by temperature, highest first. So we need to sort: Barcelona +25°C, Lima +16°C, Stockholm +13°C.\n\nThus the sorted list: Barcelona: ☀\u{fe0f} +25°C, Lima: ⛅\u{fe0f} +16°C, Stockholm: ☀\u{fe0f} +13°C.\n\nWe should present them as a list, maybe bullet points.\n\nWe need to ensure we include the weather icons and temperature values as given.\n\nThus answer: \n\n- Barcelona: ☀\u{fe0f} +25°C\n- Lima: ⛅\u{fe0f} +16°C\n- Stockholm: ☀\u{fe0f} +13°C\n\nWe could also include the original strings.\n\nThus final answer: a list sorted by temperature highest first.\n\nWe should also note that the data is from the function calls.\n\nThus answer: \"Here are the current weather conditions for the three cities, sorted by temperature (highest first): ...\"\n\nWe should also mention that the temperatures are approximate and may change.\n\nThus final answer."),
                "content": String("Here are the current conditions for the three cities, sorted by temperature (highest\u{202f}→\u{202f}lowest):\n\n- **Barcelona**: ☀\u{fe0f}\u{202f}+25\u{202f}°C  \n- **Lima**:  ⛅\u{fe0f}\u{202f}+16\u{202f}°C  \n- **Stockholm**: ☀\u{fe0f}\u{202f}+13\u{202f}°C  \n\n*(Temperatures are taken from the latest weather data at the time of the query.)*"),
            },
        },
    ],
    "created": Number(1754730237),
    "model": String("gpt-oss-20b-MXFP4.gguf"),
    "system_fingerprint": String("b6124-da671637"),
    "object": String("chat.completion"),
    "usage": Object {
        "completion_tokens": Number(440),
        "prompt_tokens": Number(361),
        "total_tokens": Number(801),
    },
    "id": String("chatcmpl-efjEpQIpXzIGe9j4F4gnC1X39B7mHOa3"),
    "__verbose": Object {
        "index": Number(0),
        "content": String("<|channel|>analysis<|message|>The user asks: \"What is the current weather in Barcelona, Stockholm, and Lima? And also, display them in a list sorted by their temperatures, highest first.\"\n\nWe have fetched weather for each location via the get_weather function. The function returns a JSON string with \"result\": \"Barcelona: ☀\u{fe0f}   +25°C\\n\". Similarly for Stockholm: \"Stockholm: ☀\u{fe0f}   +13°C\\n\". Lima: \"Lima: ⛅\u{fe0f}  +16°C\\n\". We need to parse these results, extract the temperature values, sort them descending, and display them in a list.\n\nWe need to produce a final answer that includes the weather for each location sorted by temperature highest first. The user wants a list sorted by temperature, highest first. So we need to sort: Barcelona +25°C, Lima +16°C, Stockholm +13°C.\n\nThus the sorted list: Barcelona: ☀\u{fe0f} +25°C, Lima: ⛅\u{fe0f} +16°C, Stockholm: ☀\u{fe0f} +13°C.\n\nWe should present them as a list, maybe bullet points.\n\nWe need to ensure we include the weather icons and temperature values as given.\n\nThus answer: \n\n- Barcelona: ☀\u{fe0f} +25°C\n- Lima: ⛅\u{fe0f} +16°C\n- Stockholm: ☀\u{fe0f} +13°C\n\nWe could also include the original strings.\n\nThus final answer: a list sorted by temperature highest first.\n\nWe should also note that the data is from the function calls.\n\nThus answer: \"Here are the current weather conditions for the three cities, sorted by temperature (highest first): ...\"\n\nWe should also mention that the temperatures are approximate and may change.\n\nThus final answer.\n\n<|end|><|start|>assistant<|channel|>final<|message|>Here are the current conditions for the three cities, sorted by temperature (highest\u{202f}→\u{202f}lowest):\n\n- **Barcelona**: ☀\u{fe0f}\u{202f}+25\u{202f}°C  \n- **Lima**: ⛅\u{fe0f}\u{202f}+16\u{202f}°C  \n- **Stockholm**: ☀\u{fe0f}\u{202f}+13\u{202f}°C  \n\n*(Temperatures are taken from the latest weather data at the time of the query.)*"),
        "tokens": Array [],
        "id_slot": Number(0),
        "stop": Bool(true),
        "model": String("gpt-oss-20b-MXFP4.gguf"),
        "tokens_predicted": Number(440),
        "tokens_evaluated": Number(361),
        "generation_settings": Object {
            "n_predict": Number(4096),
            "seed": Number(4294967295),
            "temperature": Number(1.0),
            "dynatemp_range": Number(0.0),
            "dynatemp_exponent": Number(1.0),
            "top_k": Number(40),
            "top_p": Number(1.0),
            "min_p": Number(1.0),
            "top_n_sigma": Number(-1.0),
            "xtc_probability": Number(0.0),
            "xtc_threshold": Number(0.10000000149011612),
            "typical_p": Number(1.0),
            "repeat_last_n": Number(64),
            "repeat_penalty": Number(1.0),
            "presence_penalty": Number(0.0),
            "frequency_penalty": Number(0.0),
            "dry_multiplier": Number(0.0),
            "dry_base": Number(1.75),
            "dry_allowed_length": Number(2),
            "dry_penalty_last_n": Number(131072),
            "dry_sequence_breakers": Array [
                String("\n"),
                String(":"),
                String("\""),
                String("*"),
            ],
            "mirostat": Number(0),
            "mirostat_tau": Number(5.0),
            "mirostat_eta": Number(0.10000000149011612),
            "stop": Array [],
            "max_tokens": Number(4096),
            "n_keep": Number(0),
            "n_discard": Number(0),
            "ignore_eos": Bool(false),
            "stream": Bool(false),
            "logit_bias": Array [],
            "n_probs": Number(0),
            "min_keep": Number(0),
            "grammar": String("add-args ::= \"{\" space add-args-a-kv \",\" space add-args-b-kv \"}\" space\nadd-args-a-kv ::= \"\\\"a\\\"\" space \":\" space number\nadd-args-b-kv ::= \"\\\"b\\\"\" space \":\" space number\nadd-call ::= \"add\" space \"<|constrain|>\"? \"json\" space \"<|message|>\" add-args\nchar ::= [^\"\\\\\\x7F\\x00-\\x1F] | [\\\\] ([\"\\\\bfnrt] | \"u\" [0-9a-fA-F]{4})\ndecimal-part ::= [0-9]{1,16}\nget-weather-args ::= \"{\" space get-weather-args-location-kv \"}\" space\nget-weather-args-location-kv ::= \"\\\"location\\\"\" space \":\" space string\nget-weather-call ::= \"get_weather\" space \"<|constrain|>\"? \"json\" space \"<|message|>\" get-weather-args\nintegral-part ::= [0] | [1-9] [0-9]{0,15}\nmultiply-args ::= \"{\" space multiply-args-a-kv \",\" space multiply-args-b-kv \"}\" space\nmultiply-args-a-kv ::= \"\\\"a\\\"\" space \":\" space number\nmultiply-args-b-kv ::= \"\\\"b\\\"\" space \":\" space number\nmultiply-call ::= \"multiply\" space \"<|constrain|>\"? \"json\" space \"<|message|>\" multiply-args\nnumber ::= (\"-\"? integral-part) (\".\" decimal-part)? ([eE] [-+]? integral-part)? space\nroot ::= \"<|channel|>commentary to=functions.\" tool-call\nspace ::= | \" \" | \"\\n\"{1,2} [ \\t]{0,20}\nstring ::= \"\\\"\" char* \"\\\"\" space\ntool-call ::= add-call | multiply-call | get-weather-call\n"),
            "grammar_lazy": Bool(true),
            "grammar_triggers": Array [
                Object {
                    "type": Number(2),
                    "value": String("<\\|channel\\|>commentary to"),
                },
            ],
            "preserved_tokens": Array [
                Number(200003),
                Number(200005),
                Number(200006),
                Number(200007),
                Number(200008),
            ],
            "chat_format": String("GPT-OSS"),
            "reasoning_format": String("auto"),
            "reasoning_in_content": Bool(false),
            "thinking_forced_open": Bool(false),
            "samplers": Array [
                String("top_p"),
                String("min_p"),
                String("temperature"),
            ],
            "speculative.n_max": Number(16),
            "speculative.n_min": Number(0),
            "speculative.p_min": Number(0.75),
            "timings_per_token": Bool(false),
            "post_sampling_probs": Bool(false),
            "lora": Array [],
        },
        "prompt": String("<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.\nKnowledge cutoff: 2024-06\nCurrent date: 2025-08-09\n\nReasoning: high\n\n# Valid channels: analysis, commentary, final. Channel must be included for every message.\nCalls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions\n\nYou are a helpful assistant. Help the user with whatever they need.\n\n\n# Tools\n\n## functions\n\nnamespace functions {\n\n// adds two numbers\ntype add = (_: {\na: number,\nb: number\n}) => any;\n\n// multiplies two numbers\ntype multiply = (_: {\na: number,\nb: number\n}) => any;\n\n// Get the weather for the specified location\ntype get_weather = (_: {\nlocation: string\n}) => any;\n\n} // namespace functions<|end|><|start|>user<|message|>What is the current weather in Barcelona, Stockholm, and Lima? And also, display them in a list sorted by their temperatures, highest first.<|end|><|start|>assistant to=functions.get_weather<|channel|>commentary json<|message|>{\"location\": \"Barcelona\"}<|call|><|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>\"{\\\"result\\\":\\\"Barcelona: ☀\u{fe0f}   +25°C\\\\n\\\"}\"<|end|><|start|>assistant to=functions.get_weather<|channel|>commentary json<|message|>{\"location\": \"Stockholm\"}<|call|><|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>\"{\\\"result\\\":\\\"Stockholm: ☀\u{fe0f}   +13°C\\\\n\\\"}\"<|end|><|start|>assistant to=functions.get_weather<|channel|>commentary json<|message|>{\"location\": \"Lima\"}<|call|><|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>\"{\\\"result\\\":\\\"Lima: ⛅\u{fe0f}  +16°C\\\\n\\\"}\"<|end|><|start|>assistant"),
        "has_new_line": Bool(true),
        "truncated": Bool(false),
        "stop_type": String("eos"),
        "stopping_word": String(""),
        "tokens_cached": Number(800),
        "timings": Object {
            "prompt_n": Number(49),
            "prompt_ms": Number(80.661),
            "prompt_per_token_ms": Number(1.6461428571428571),
            "prompt_per_second": Number(607.4806907923283),
            "predicted_n": Number(440),
            "predicted_ms": Number(2565.854),
            "predicted_per_token_ms": Number(5.831486363636364),
            "predicted_per_second": Number(171.48286691292645),
        },
    },
    "timings": Object {
        "prompt_n": Number(49),
        "prompt_ms": Number(80.661),
        "prompt_per_token_ms": Number(1.6461428571428571),
        "prompt_per_second": Number(607.4806907923283),
        "predicted_n": Number(440),
        "predicted_ms": Number(2565.854),
        "predicted_per_token_ms": Number(5.831486363636364),
        "predicted_per_second": Number(171.48286691292645),
    },
}
got:
ChatCompletionResponse {
    choices: [
        Choice {
            message: ResponseMessage {
                content: Some(
                    "Here are the current conditions for the three cities, sorted by temperature (highest\u{202f}→\u{202f}lowest):\n\n- **Barcelona**: ☀\u{fe0f}\u{202f}+25\u{202f}°C  \n- **Lima**: ⛅\u{fe0f}\u{202f}+16\u{202f}°C  \n- **Stockholm**: ☀\u{fe0f}\u{202f}+13\u{202f}°C  \n\n*(Temperatures are taken from the latest weather data at the time of the query.)*",
                ),
                reasoning_content: Some(
                    "The user asks: \"What is the current weather in Barcelona, Stockholm, and Lima? And also, display them in a list sorted by their temperatures, highest first.\"\n\nWe have fetched weather for each location via the get_weather function. The function returns a JSON string with \"result\": \"Barcelona: ☀\u{fe0f}   +25°C\\n\". Similarly for Stockholm: \"Stockholm: ☀\u{fe0f}   +13°C\\n\". Lima: \"Lima: ⛅\u{fe0f}  +16°C\\n\". We need to parse these results, extract the temperature values, sort them descending, and display them in a list.\n\nWe need to produce a final answer that includes the weather for each location sorted by temperature highest first. The user wants a list sorted by temperature, highest first. So we need to sort: Barcelona +25°C, Lima +16°C, Stockholm +13°C.\n\nThus the sorted list: Barcelona: ☀\u{fe0f} +25°C, Lima: ⛅\u{fe0f} +16°C, Stockholm: ☀\u{fe0f} +13°C.\n\nWe should present them as a list, maybe bullet points.\n\nWe need to ensure we include the weather icons and temperature values as given.\n\nThus answer: \n\n- Barcelona: ☀\u{fe0f} +25°C\n- Lima: ⛅\u{fe0f} +16°C\n- Stockholm: ☀\u{fe0f} +13°C\n\nWe could also include the original strings.\n\nThus final answer: a list sorted by temperature highest first.\n\nWe should also note that the data is from the function calls.\n\nThus answer: \"Here are the current weather conditions for the three cities, sorted by temperature (highest first): ...\"\n\nWe should also mention that the temperatures are approximate and may change.\n\nThus final answer.",
                ),
                tool_calls: [],
            },
        },
    ],
}
############# SHOULD BE RETURNING NOW< ALL DONE

Assistant: Here are the current conditions for the three cities, sorted by temperature (highest → lowest):

- **Barcelona**: ☀️ +25 °C
- **Lima**: ⛅️ +16 °C
- **Stockholm**: ☀️ +13 °C

*(Temperatures are taken from the latest weather data at the time of the query.)*

Meanwhile, a bad runs ends up with:

ChatCompletionResponse {
    choices: [
        Choice {
            message: ResponseMessage {
                content: Some(
                    " to=functions.get_weather\u{a0}\u{200b}\u{200b}\u{a0}\u{a0}\n\n\n\n",
                ),
                reasoning_content: None,
                tool_calls: [],
            },
        },
    ],
}

Full logs from bad run:

sending:
[
    ChatMessage {
        role: "system",
        content: Some(
            "You are a helpful assistant. Help the user with whatever they need.\n",
        ),
        channel: None,
        recipient: None,
        tool_calls: None,
        tool_call_id: None,
    },
    ChatMessage {
        role: "user",
        content: Some(
            "What is the current weather in Barcelona, Stockholm, and Lima? And also, display them in a list sorted by their temperatures, highest first.",
        ),
        channel: None,
        recipient: None,
        tool_calls: None,
        tool_call_id: None,
    },
    ChatMessage {
        role: "assistant",
        content: Some(
            "",
        ),
        channel: Some(
            "commentary",
        ),
        recipient: None,
        tool_calls: Some(
            [
                ToolCall {
                    id: "uoYcwKVzv9haFDLHzVI9PcnAcICFcXmy",
                    type: "function",
                    function: ToolCallFunction {
                        name: "get_weather",
                        arguments: "{\"location\":\"Barcelona\"}",
                    },
                },
            ],
        ),
        tool_call_id: None,
    },
    ChatMessage {
        role: "tool",
        content: Some(
            "{\"result\":\"Barcelona: ☀\u{fe0f}   +25°C\\n\"}",
        ),
        channel: Some(
            "commentary",
        ),
        recipient: None,
        tool_calls: None,
        tool_call_id: Some(
            "uoYcwKVzv9haFDLHzVI9PcnAcICFcXmy",
        ),
    },
    ChatMessage {
        role: "assistant",
        content: Some(
            "",
        ),
        channel: Some(
            "commentary",
        ),
        recipient: None,
        tool_calls: Some(
            [
                ToolCall {
                    id: "qiY0di8Ec9BxfVuJa5Nw4flvAsEhs9DY",
                    type: "function",
                    function: ToolCallFunction {
                        name: "get_weather",
                        arguments: "{\"location\":\"Stockholm\"}",
                    },
                },
            ],
        ),
        tool_call_id: None,
    },
    ChatMessage {
        role: "tool",
        content: Some(
            "{\"result\":\"Stockholm: ☀\u{fe0f}   +13°C\\n\"}",
        ),
        channel: Some(
            "commentary",
        ),
        recipient: None,
        tool_calls: None,
        tool_call_id: Some(
            "qiY0di8Ec9BxfVuJa5Nw4flvAsEhs9DY",
        ),
    },
]
[src/lib.rs:38:9] &val = Object {
    "choices": Array [
        Object {
            "finish_reason": String("stop"),
            "index": Number(0),
            "message": Object {
                "role": String("assistant"),
                "content": String(" to=functions.get_weather\u{a0}\u{200b}\u{200b}\u{a0}\u{a0}\n\n\n\n"),
            },
        },
    ],
    "created": Number(1754730110),
    "model": String("gpt-oss-20b-MXFP4.gguf"),
    "system_fingerprint": String("b6124-da671637"),
    "object": String("chat.completion"),
    "usage": Object {
        "completion_tokens": Number(12),
        "prompt_tokens": Number(310),
        "total_tokens": Number(322),
    },
    "id": String("chatcmpl-MKwVwT9hOE93A4IvYowdcn7f7mvFOvaR"),
    "__verbose": Object {
        "index": Number(0),
        "content": String(" to=functions.get_weather\u{a0}\u{200b}\u{200b}\u{a0}\u{a0}\n\n\n\n"),
        "tokens": Array [],
        "id_slot": Number(0),
        "stop": Bool(true),
        "model": String("gpt-oss-20b-MXFP4.gguf"),
        "tokens_predicted": Number(12),
        "tokens_evaluated": Number(310),
        "generation_settings": Object {
            "n_predict": Number(4096),
            "seed": Number(4294967295),
            "temperature": Number(1.0),
            "dynatemp_range": Number(0.0),
            "dynatemp_exponent": Number(1.0),
            "top_k": Number(40),
            "top_p": Number(1.0),
            "min_p": Number(1.0),
            "top_n_sigma": Number(-1.0),
            "xtc_probability": Number(0.0),
            "xtc_threshold": Number(0.10000000149011612),
            "typical_p": Number(1.0),
            "repeat_last_n": Number(64),
            "repeat_penalty": Number(1.0),
            "presence_penalty": Number(0.0),
            "frequency_penalty": Number(0.0),
            "dry_multiplier": Number(0.0),
            "dry_base": Number(1.75),
            "dry_allowed_length": Number(2),
            "dry_penalty_last_n": Number(131072),
            "dry_sequence_breakers": Array [
                String("\n"),
                String(":"),
                String("\""),
                String("*"),
            ],
            "mirostat": Number(0),
            "mirostat_tau": Number(5.0),
            "mirostat_eta": Number(0.10000000149011612),
            "stop": Array [],
            "max_tokens": Number(4096),
            "n_keep": Number(0),
            "n_discard": Number(0),
            "ignore_eos": Bool(false),
            "stream": Bool(false),
            "logit_bias": Array [],
            "n_probs": Number(0),
            "min_keep": Number(0),
            "grammar": String("add-args ::= \"{\" space add-args-a-kv \",\" space add-args-b-kv \"}\" space\nadd-args-a-kv ::= \"\\\"a\\\"\" space \":\" space number\nadd-args-b-kv ::= \"\\\"b\\\"\" space \":\" space number\nadd-call ::= \"add\" space \"<|constrain|>\"? \"json\" space \"<|message|>\" add-args\nchar ::= [^\"\\\\\\x7F\\x00-\\x1F] | [\\\\] ([\"\\\\bfnrt] | \"u\" [0-9a-fA-F]{4})\ndecimal-part ::= [0-9]{1,16}\nget-weather-args ::= \"{\" space get-weather-args-location-kv \"}\" space\nget-weather-args-location-kv ::= \"\\\"location\\\"\" space \":\" space string\nget-weather-call ::= \"get_weather\" space \"<|constrain|>\"? \"json\" space \"<|message|>\" get-weather-args\nintegral-part ::= [0] | [1-9] [0-9]{0,15}\nmultiply-args ::= \"{\" space multiply-args-a-kv \",\" space multiply-args-b-kv \"}\" space\nmultiply-args-a-kv ::= \"\\\"a\\\"\" space \":\" space number\nmultiply-args-b-kv ::= \"\\\"b\\\"\" space \":\" space number\nmultiply-call ::= \"multiply\" space \"<|constrain|>\"? \"json\" space \"<|message|>\" multiply-args\nnumber ::= (\"-\"? integral-part) (\".\" decimal-part)? ([eE] [-+]? integral-part)? space\nroot ::= \"<|channel|>commentary to=functions.\" tool-call\nspace ::= | \" \" | \"\\n\"{1,2} [ \\t]{0,20}\nstring ::= \"\\\"\" char* \"\\\"\" space\ntool-call ::= add-call | multiply-call | get-weather-call\n"),
            "grammar_lazy": Bool(true),
            "grammar_triggers": Array [
                Object {
                    "type": Number(2),
                    "value": String("<\\|channel\\|>commentary to"),
                },
            ],
            "preserved_tokens": Array [
                Number(200003),
                Number(200005),
                Number(200006),
                Number(200007),
                Number(200008),
            ],
            "chat_format": String("GPT-OSS"),
            "reasoning_format": String("auto"),
            "reasoning_in_content": Bool(false),
            "thinking_forced_open": Bool(false),
            "samplers": Array [
                String("top_p"),
                String("min_p"),
                String("temperature"),
            ],
            "speculative.n_max": Number(16),
            "speculative.n_min": Number(0),
            "speculative.p_min": Number(0.75),
            "timings_per_token": Bool(false),
            "post_sampling_probs": Bool(false),
            "lora": Array [],
        },
        "prompt": String("<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.\nKnowledge cutoff: 2024-06\nCurrent date: 2025-08-09\n\nReasoning: low\n\n# Valid channels: analysis, commentary, final. Channel must be included for every message.\nCalls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions\n\nYou are a helpful assistant. Help the user with whatever they need.\n\n\n# Tools\n\n## functions\n\nnamespace functions {\n\n// adds two numbers\ntype add = (_: {\na: number,\nb: number\n}) => any;\n\n// multiplies two numbers\ntype multiply = (_: {\na: number,\nb: number\n}) => any;\n\n// Get the weather for the specified location\ntype get_weather = (_: {\nlocation: string\n}) => any;\n\n} // namespace functions<|end|><|start|>user<|message|>What is the current weather in Barcelona, Stockholm, and Lima? And also, display them in a list sorted by their temperatures, highest first.<|end|><|start|>assistant to=functions.get_weather<|channel|>commentary json<|message|>{\"location\": \"Barcelona\"}<|call|><|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>\"{\\\"result\\\":\\\"Barcelona: ☀\u{fe0f}   +25°C\\\\n\\\"}\"<|end|><|start|>assistant to=functions.get_weather<|channel|>commentary json<|message|>{\"location\": \"Stockholm\"}<|call|><|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>\"{\\\"result\\\":\\\"Stockholm: ☀\u{fe0f}   +13°C\\\\n\\\"}\"<|end|><|start|>assistant"),
        "has_new_line": Bool(true),
        "truncated": Bool(false),
        "stop_type": String("eos"),
        "stopping_word": String(""),
        "tokens_cached": Number(321),
        "timings": Object {
            "prompt_n": Number(50),
            "prompt_ms": Number(78.391),
            "prompt_per_token_ms": Number(1.5678200000000002),
            "prompt_per_second": Number(637.8283221288157),
            "predicted_n": Number(12),
            "predicted_ms": Number(64.481),
            "predicted_per_token_ms": Number(5.3734166666666665),
            "predicted_per_second": Number(186.1013321753695),
        },
    },
    "timings": Object {
        "prompt_n": Number(50),
        "prompt_ms": Number(78.391),
        "prompt_per_token_ms": Number(1.5678200000000002),
        "prompt_per_second": Number(637.8283221288157),
        "predicted_n": Number(12),
        "predicted_ms": Number(64.481),
        "predicted_per_token_ms": Number(5.3734166666666665),
        "predicted_per_second": Number(186.1013321753695),
    },
}
got:
ChatCompletionResponse {
    choices: [
        Choice {
            message: ResponseMessage {
                content: Some(
                    " to=functions.get_weather\u{a0}\u{200b}\u{200b}\u{a0}\u{a0}\n\n\n\n",
                ),
                reasoning_content: None,
                tool_calls: [],
            },
        },
    ],
}
############# SHOULD BE RETURNING NOW< ALL DONE

Assistant: to=functions.get_weather

Seems to happen more often when reasoning_effort is set to low, compared to when it's set to high, but I'm not 100% sure I'm imagining this. But if true, could be inference problem from the model itself, where it gets the syntax wrong? I'm really not sure what's going on here.

Mushoz · 2025-08-09T09:10:17Z

@victorb maybe use temperature= 0 and/or top-k 1? If inference is the issue, making it deterministic would fix it.

victorb · 2025-08-09T10:28:13Z

@Mushoz

maybe use temperature= 0 and/or top-k 1? If inference is the issue, making it deterministic would fix it.

Running with these inference parameters for example:

{
        temperature: 0.0,
        top_p: 1.0,
        min_p: 0.0,
        top_k: 0,
        samplers: [
            "top_k",
            "top_p",
            "min_p",
            "temperature",
        ],
}

Seems to correctly give me deterministic responses, which once I get one good response, they always work well, but the ones that break, always break, so I guess useful for testing at the very least. Here's one example of broken parsing I'm currently getting, even with temperature=0 and top-k to various values:

ChatCompletionResponse {
    choices: [
        Choice {
            message: ResponseMessage {
                content: Some(
                    " to=function\u{a0}\u{a0}...",
                ),
                reasoning_content: None,
                tool_calls: [],
            },
        },
    ],
}

sending:
[
    ChatMessage {
        role: "system",
        content: Some(
            "You are a helpful assistant. Help the user with whatever they need.\n",
        ),
        channel: None,
        recipient: None,
        tool_calls: None,
        tool_call_id: None,
    },
    ChatMessage {
        role: "user",
        content: Some(
            "What is the current weather in Barcelona, Stockholm, and Beijing? And also, display them in a list sorted by their temperatures, highest first.",
        ),
        channel: None,
        recipient: None,
        tool_calls: None,
        tool_call_id: None,
    },
    ChatMessage {
        role: "assistant",
        content: Some(
            "",
        ),
        channel: Some(
            "commentary",
        ),
        recipient: None,
        tool_calls: Some(
            [
                ToolCall {
                    id: "h4fZmZGG2zWXlE6IOqBzTnzdrUFavQFu",
                    type: "function",
                    function: ToolCallFunction {
                        name: "get_weather",
                        arguments: "{\"location\":\"Barcelona\"}",
                    },
                },
            ],
        ),
        tool_call_id: None,
    },
    ChatMessage {
        role: "tool",
        content: Some(
            "{\"result\":\"Barcelona: ☀\u{fe0f}  19°C (mocked)\"}",
        ),
        channel: Some(
            "commentary",
        ),
        recipient: None,
        tool_calls: None,
        tool_call_id: Some(
            "h4fZmZGG2zWXlE6IOqBzTnzdrUFavQFu",
        ),
    },
    ChatMessage {
        role: "assistant",
        content: Some(
            "",
        ),
        channel: Some(
            "commentary",
        ),
        recipient: None,
        tool_calls: Some(
            [
                ToolCall {
                    id: "lek3lo184KRjWObZv6y1rgkdKkBgoFj7",
                    type: "function",
                    function: ToolCallFunction {
                        name: "get_weather",
                        arguments: "{\"location\":\"Stockholm\"}",
                    },
                },
            ],
        ),
        tool_call_id: None,
    },
    ChatMessage {
        role: "tool",
        content: Some(
            "{\"result\":\"Stockholm: ☀\u{fe0f}  26°C (mocked)\"}",
        ),
        channel: Some(
            "commentary",
        ),
        recipient: None,
        tool_calls: None,
        tool_call_id: Some(
            "lek3lo184KRjWObZv6y1rgkdKkBgoFj7",
        ),
    },
]
[src/lib.rs:38:9] &val = Object {
    "choices": Array [
        Object {
            "finish_reason": String("stop"),
            "index": Number(0),
            "message": Object {
                "role": String("assistant"),
                "content": String(" to=function\u{a0}\u{a0}..."),
            },
        },
    ],
    "created": Number(1754735169),
    "model": String("gpt-oss-120b-MXFP4.gguf"),
    "system_fingerprint": String("b6124-da671637"),
    "object": String("chat.completion"),
    "usage": Object {
        "completion_tokens": Number(7),
        "prompt_tokens": Number(314),
        "total_tokens": Number(321),
    },
    "id": String("chatcmpl-kXPt4WpoM4AUGLhbku8VlKSwZkktJUDA"),
    "__verbose": Object {
        "index": Number(0),
        "content": String(" to=function\u{a0}\u{a0}..."),
        "tokens": Array [],
        "id_slot": Number(0),
        "stop": Bool(true),
        "model": String("gpt-oss-120b-MXFP4.gguf"),
        "tokens_predicted": Number(7),
        "tokens_evaluated": Number(314),
        "generation_settings": Object {
            "n_predict": Number(4096),
            "seed": Number(4294967295),
            "temperature": Number(0.0),
            "dynatemp_range": Number(0.0),
            "dynatemp_exponent": Number(1.0),
            "top_k": Number(0),
            "top_p": Number(1.0),
            "min_p": Number(0.0),
            "top_n_sigma": Number(-1.0),
            "xtc_probability": Number(0.0),
            "xtc_threshold": Number(0.10000000149011612),
            "typical_p": Number(1.0),
            "repeat_last_n": Number(64),
            "repeat_penalty": Number(1.0),
            "presence_penalty": Number(0.0),
            "frequency_penalty": Number(0.0),
            "dry_multiplier": Number(0.0),
            "dry_base": Number(1.75),
            "dry_allowed_length": Number(2),
            "dry_penalty_last_n": Number(131072),
            "dry_sequence_breakers": Array [
                String("\n"),
                String(":"),
                String("\""),
                String("*"),
            ],
            "mirostat": Number(0),
            "mirostat_tau": Number(5.0),
            "mirostat_eta": Number(0.10000000149011612),
            "stop": Array [],
            "max_tokens": Number(4096),
            "n_keep": Number(0),
            "n_discard": Number(0),
            "ignore_eos": Bool(false),
            "stream": Bool(false),
            "logit_bias": Array [],
            "n_probs": Number(0),
            "min_keep": Number(0),
            "grammar": String("add-args ::= \"{\" space add-args-a-kv \",\" space add-args-b-kv \"}\" space\nadd-args-a-kv ::= \"\\\"a\\\"\" space \":\" space number\nadd-args-b-kv ::= \"\\\"b\\\"\" space \":\" space number\nadd-call ::= \"add\" space \"<|constrain|>\"? \"json\" space \"<|message|>\" add-args\nchar ::= [^\"\\\\\\x7F\\x00-\\x1F] | [\\\\] ([\"\\\\bfnrt] | \"u\" [0-9a-fA-F]{4})\ndecimal-part ::= [0-9]{1,16}\nget-weather-args ::= \"{\" space get-weather-args-location-kv \"}\" space\nget-weather-args-location-kv ::= \"\\\"location\\\"\" space \":\" space string\nget-weather-call ::= \"get_weather\" space \"<|constrain|>\"? \"json\" space \"<|message|>\" get-weather-args\nintegral-part ::= [0] | [1-9] [0-9]{0,15}\nmultiply-args ::= \"{\" space multiply-args-a-kv \",\" space multiply-args-b-kv \"}\" space\nmultiply-args-a-kv ::= \"\\\"a\\\"\" space \":\" space number\nmultiply-args-b-kv ::= \"\\\"b\\\"\" space \":\" space number\nmultiply-call ::= \"multiply\" space \"<|constrain|>\"? \"json\" space \"<|message|>\" multiply-args\nnumber ::= (\"-\"? integral-part) (\".\" decimal-part)? ([eE] [-+]? integral-part)? space\nroot ::= \"<|channel|>commentary to=functions.\" tool-call\nspace ::= | \" \" | \"\\n\"{1,2} [ \\t]{0,20}\nstring ::= \"\\\"\" char* \"\\\"\" space\ntool-call ::= add-call | multiply-call | get-weather-call\n"),
            "grammar_lazy": Bool(true),
            "grammar_triggers": Array [
                Object {
                    "type": Number(2),
                    "value": String("<\\|channel\\|>commentary to"),
                },
            ],
            "preserved_tokens": Array [
                Number(200003),
                Number(200005),
                Number(200006),
                Number(200007),
                Number(200008),
            ],
            "chat_format": String("GPT-OSS"),
            "reasoning_format": String("auto"),
            "reasoning_in_content": Bool(false),
            "thinking_forced_open": Bool(false),
            "samplers": Array [
                String("top_k"),
                String("top_p"),
                String("min_p"),
                String("temperature"),
            ],
            "speculative.n_max": Number(16),
            "speculative.n_min": Number(0),
            "speculative.p_min": Number(0.75),
            "timings_per_token": Bool(false),
            "post_sampling_probs": Bool(false),
            "lora": Array [],
        },
        "prompt": String("<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.\nKnowledge cutoff: 2024-06\nCurrent date: 2025-08-09\n\nReasoning: low\n\n# Valid channels: analysis, commentary, final. Channel must be included for every message.\nCalls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions\n\nYou are a helpful assistant. Help the user with whatever they need.\n\n\n# Tools\n\n## functions\n\nnamespace functions {\n\n// adds two numbers\ntype add = (_: {\na: number,\nb: number\n}) => any;\n\n// multiplies two numbers\ntype multiply = (_: {\na: number,\nb: number\n}) => any;\n\n// Get the weather for the specified location\ntype get_weather = (_: {\nlocation: string\n}) => any;\n\n} // namespace functions<|end|><|start|>user<|message|>What is the current weather in Barcelona, Stockholm, and Beijing? And also, display them in a list sorted by their temperatures, highest first.<|end|><|start|>assistant to=functions.get_weather<|channel|>commentary json<|message|>{\"location\": \"Barcelona\"}<|call|><|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>\"{\\\"result\\\":\\\"Barcelona: ☀\u{fe0f}  19°C (mocked)\\\"}\"<|end|><|start|>assistant to=functions.get_weather<|channel|>commentary json<|message|>{\"location\": \"Stockholm\"}<|call|><|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>\"{\\\"result\\\":\\\"Stockholm: ☀\u{fe0f}  26°C (mocked)\\\"}\"<|end|><|start|>assistant"),
        "has_new_line": Bool(false),
        "truncated": Bool(false),
        "stop_type": String("eos"),
        "stopping_word": String(""),
        "tokens_cached": Number(320),
        "timings": Object {
            "prompt_n": Number(52),
            "prompt_ms": Number(82.733),
            "prompt_per_token_ms": Number(1.5910192307692308),
            "prompt_per_second": Number(628.5279151003831),
            "predicted_n": Number(7),
            "predicted_ms": Number(54.931),
            "predicted_per_token_ms": Number(7.8472857142857135),
            "predicted_per_second": Number(127.4325972583787),
        },
    },
    "timings": Object {
        "prompt_n": Number(52),
        "prompt_ms": Number(82.733),
        "prompt_per_token_ms": Number(1.5910192307692308),
        "prompt_per_second": Number(628.5279151003831),
        "predicted_n": Number(7),
        "predicted_ms": Number(54.931),
        "predicted_per_token_ms": Number(7.8472857142857135),
        "predicted_per_second": Number(127.4325972583787),
    },
}
got:
ChatCompletionResponse {
    choices: [
        Choice {
            message: ResponseMessage {
                content: Some(
                    " to=function\u{a0}\u{a0}...",
                ),
                reasoning_content: None,
                tool_calls: [],
            },
        },
    ],
}
############# SHOULD BE RETURNING NOW< ALL DONE

Assistant: to=function  ...

Tried setting top-k to 0, 1 and 100 and get the same results.

aldehir · 2025-08-09T15:12:58Z

@victorb thank you for that extensive testing. I can't seem to reproduce this on gpt-oss-20b. Can you provide the last entry in the server log where it begins parsing:

srv  update_chat_: Parsing chat message: <|channel|>analysis<|message|>We need to list sorted by temperature. Pr...

That will help me better understand the problem. It appears the model is emitting unicode space characters, but I wasn't aware the space symbol in the grammar would accept unicode. Still digging more into that.

aldehir · 2025-08-09T19:57:34Z

I managed to get gpt-oss-120b running, albeit slowly.

Looks like I missed a scenario where the model outputs the recipient (to=) in the role and the message in a commentary or analysis channel:

<|start|>assistant to=functions.get_weather<|channel|>commentary <|constrain|>json<|message|>{ ... }

I have yet to see the gpt-oss-20b model exhibit this behavior, but it is documented in the harmony docs.

I updated the parsing and grammar rule to handle this. It should at least parse the tool calls now.

I found performance degrades by the third call. I get queries to "Lima??", "Lima?", or some variation with garbage at the end. However, if I pass reasoning_content to every message, I get good results. I was able to extend the query to 5 cities by doing so.

Give cf9a0d6 a shot.

aldehir requested a review from ngxson as a code owner August 8, 2025 18:51

github-actions bot added testing Everything test related examples server labels Aug 8, 2025

aldehir added 2 commits August 8, 2025 19:57

model : add harmony parser for gpt-oss

2c13a0e

gpt-oss : fix grammar trigger from causing empty stack

981886f

aldehir force-pushed the feature/harmony-parser branch from d65e556 to 981886f Compare August 9, 2025 03:18

gpt-oss: tweak the grammar trigger again

da67163

This was referenced Aug 9, 2025

GPT-OSS: parse commentary tool calls; handle glued 'json'; add unit tests (#15102) #15158

Open

Eval bug: Tools In Prompt Crashing On gpt-oss 20b #15102

Open

abc-nix mentioned this pull request Aug 9, 2025

Eval bug: output from gpt-oss-20b-F16 model has no comparison the output of the same prompt on gpt-oss.com #15190

Open

gpt-oss : add support for recipient in role header

cf9a0d6

trvon mentioned this pull request Aug 9, 2025

Fix GPT-OSS Harmony format end token handling crash #15195

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gpt-oss: implement harmony parsing #15181

gpt-oss: implement harmony parsing #15181

aldehir commented Aug 8, 2025

Uh oh!

abc-nix commented Aug 8, 2025

Uh oh!

dagbs commented Aug 8, 2025 •

edited

Loading

Uh oh!

aldehir commented Aug 9, 2025

Uh oh!

victorb commented Aug 9, 2025

Uh oh!

victorb commented Aug 9, 2025 •

edited

Loading

Uh oh!

Mushoz commented Aug 9, 2025

Uh oh!

victorb commented Aug 9, 2025

Uh oh!

aldehir commented Aug 9, 2025

Uh oh!

aldehir commented Aug 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

gpt-oss: implement harmony parsing #15181

Are you sure you want to change the base?

gpt-oss: implement harmony parsing #15181

Conversation

aldehir commented Aug 8, 2025

Implementation

Remaining Work

Uh oh!

abc-nix commented Aug 8, 2025

Uh oh!

dagbs commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aldehir commented Aug 9, 2025

Uh oh!

victorb commented Aug 9, 2025

Uh oh!

victorb commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mushoz commented Aug 9, 2025

Uh oh!

victorb commented Aug 9, 2025

Uh oh!

aldehir commented Aug 9, 2025

Uh oh!

aldehir commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dagbs commented Aug 8, 2025 •

edited

Loading

victorb commented Aug 9, 2025 •

edited

Loading

aldehir commented Aug 9, 2025 •

edited

Loading