Fix thinking blocks with quotes + add handling `[THINK]...[/THINK]` blocks #16326

ServeurpersoCom · 2025-09-29T08:57:30Z

ServeurpersoCom · 2025-09-29T08:58:44Z

…h-quotes' into fix-thinking-block-issues

allozaur · 2025-09-29T11:51:05Z

@ServeurpersoCom i've added fix for #16158 in ServeurpersoCom/pull/1 to your branch. Please review it and if all is good on your end, please merge it so that we can use this PR for solving both issues at once.

…h-quotes' into fix-thinking-block-issues

allozaur · 2025-09-29T12:26:22Z

@ServeurpersoCom CI has failed for Storybook tests, here's a fix — ServeurpersoCom#2

ServeurpersoCom · 2025-09-29T12:26:48Z

I've tested the changes across multiple models, including GPT-OSS, Seed-OSS, Qwen3*Thinking, and the Llama-3.3-Nemotron-Super-49B. I didn't observe any regressions or CoT (Chain-of-Thought) rendering issues : everything looks stable.

Fix Chat Message Storybook story

vbooka1 · 2025-10-01T13:41:57Z

This patch did not fix it for me, b6653 still does not show the thought process.

I run GLM 4.5, "unsloth dynamic quant", tried with --jinja and without it, tried checking and unchecking the "Show thought in progress" checkbox, nothing helped, the chat window shows only "Processing..." and context token counter until the model starts outputting the final result.

ServeurpersoCom · 2025-10-01T14:00:33Z

This patch did not fix it for me, b6653 still does not show the thought process.

I run GLM 4.5, "unsloth dynamic quant", tried with --jinja and without it, tried checking and unchecking the "Show thought in progress" checkbox, nothing helped, the chat window shows only "Processing..." and context token counter until the model starts outputting the final result.

I already have a test instance running with this model (Same for GLM 4.5 Air from Unsloth), and I can reproduce the same issue. I’ll take a closer look as soon as possible.

vbooka1 · 2025-10-01T14:06:41Z

This patch did not fix it for me, b6653 still does not show the thought process.
I run GLM 4.5, "unsloth dynamic quant", tried with --jinja and without it, tried checking and unchecking the "Show thought in progress" checkbox, nothing helped, the chat window shows only "Processing..." and context token counter until the model starts outputting the final result.

I already have a test instance running with this model (Same for GLM 4.5 Air from Unsloth), and I can reproduce the same issue. I’ll take a closer look as soon as possible.

it seems that GLM chat template is broken, because with Qwen 2.5 the thought process is displayed (although "◁think▷" tags are not stripped)

Qwen 2.5 72B:

GLM 4.5 355B:

allozaur · 2025-10-01T14:24:14Z

This patch did not fix it for me, b6653 still does not show the thought process.
I run GLM 4.5, "unsloth dynamic quant", tried with --jinja and without it, tried checking and unchecking the "Show thought in progress" checkbox, nothing helped, the chat window shows only "Processing..." and context token counter until the model starts outputting the final result.

I already have a test instance running with this model (Same for GLM 4.5 Air from Unsloth), and I can reproduce the same issue. I’ll take a closer look as soon as possible.

it seems that GLM chat template is broken, because with Qwen 2.5 the thought process is displayed (although "◁think▷" tags are not stripped)

Qwen 2.5 72B:

GLM 4.5 355B:

#16364 addresses the ◁think▷ tags

ServeurpersoCom · 2025-10-01T14:40:01Z

curl -s .../ia/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "MoE-GLM-4.5-Air-106B",
    "messages": [
      {"role": "user", "content": "Bonjour le monde"}
    ]
  }'
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"\n<think>Nous avons reçu un message en français : \"Bonjour le monde\". C'est la traduction de \"Hello World\", un programme classique pour les débutants en programmation.\n L'utilisateur pourrait vouloir un programme qui affiche \"Bonjour le monde\" dans un certain contexte, peut-être en Python ou un autre langage.\n Cependant, la demande est très simple et ne précise pas de langage. Puisque l'utilisateur a écrit en français, nous pouvons répondre en français.\n Nous allons supposer que l'utilisateur veut un exemple de code en Python, car c'est un langage courant pour les débutants.\n Nous pourrions fournir un code simple en Python pour afficher \"Bonjour le monde\".</think>Bonjour ! 😊 Voici un programme simple en Python pour afficher \"Bonjour le monde\" :\n\n```python\n# Programme classique \"Hello World\" en français\nprint(\"Bonjour le monde\")\n```\n\n### Explication :\n- La fonction `print()` envoie le texte à l'écran.\n- Le texte `\"Bonjour le monde\"` est mis entre guillemets pour être traité comme une chaîne de caractères.\n\n### Résultat à l'exécution :\n```\nBonjour le monde\n```\n\n### Autres langages (pour référence) :\n**JavaScript** :\n```javascript\nconsole.log(\"Bonjour le monde\");\n```\n\n**HTML** :\n```html\n<!DOCTYPE html>\n<html>\n<body>\n  <h1>Bonjour le monde</h1>\n</body>\n</html>\n```\n\n**C++** :\n```cpp\n#include <iostream>\nint main() {\n    std::cout << \"Bonjour le monde\" << std::endl;\n    return 0;\n}\n```\n\nBesoin d'autres détails ou d'exemples dans un autre langage ? 😊"}}],"created":1759329471,"model":"MoE-GLM-4.5-Air-106B","system_fingerprint":"b6658-2a9b6338","object":"chat.completion","usage":{"completion_tokens":370,"prompt_tokens":8,"total_tokens":378},"id":"chatcmpl-DFZD1Qmk4Ora8oU3JOEJeNiWhsipiRNw","timings":{"cache_n":2,"prompt_n":6,"prompt_ms":176.336,"prompt_per_token_ms":29.389333333333337,"prompt_per_second":34.02595045821613,"predicted_n":370,"predicted_ms":18191.783,"predicted_per_token_ms":49.16698108108108,"predicted_per_second":20.338852986537933}}

Oh, so it’s just the missing line break for <think>...! 😲
The code should be robust enough to handle both cases: <think> inline and <think>\n....

ServeurpersoCom · 2025-10-01T16:20:26Z

Tested with GLM-4.5 inline <think> streaming (no line breaks, unlike Qwen).
The patch correctly captures reasoning content and keeps the assistant message clean:

It still needs review and no-regression testing across all models:
64156f5

(I set up a dedicated machine with available for live WebUI tests
https://www.serveurperso.com/ia/ there is GLM 4.5 Air)

allozaur · 2025-10-01T17:00:59Z

Tested with GLM-4.5 inline streaming (no line breaks, unlike Qwen). The patch correctly captures reasoning content and keeps the assistant message clean:

It still needs review and no-regression testing across all models: 64156f5

(I set up a dedicated machine with available for live WebUI tests https://www.serveurperso.com/ia/ there is GLM 4.5 Air)

I've merged this commit into #16364. @ServeurpersoCom lemme know once you've tested this thoroughly and if I can be of any more help with this.

ServeurpersoCom added 2 commits September 29, 2025 10:50

fix: prevent reasoning blocks with quotes from being truncated

4ec118e

chore: update webui build output

5d7f31a

ServeurpersoCom requested a review from allozaur as a code owner September 29, 2025 08:57

github-actions bot added examples server labels Sep 29, 2025

allozaur approved these changes Sep 29, 2025

View reviewed changes

allozaur added server/webui bugfix fixes an issue or bug labels Sep 29, 2025

allozaur added 3 commits September 29, 2025 13:47

Merge remote-tracking branch 'ServeurpersoCom/fix-thinking-blocks-wit…

fcfb13f

…h-quotes' into fix-thinking-block-issues

feat: Improve thinking content parsing

4f45062

test: Adds ChatMessage component stories for different thinking blocks

449191f

ServeurpersoCom and others added 3 commits September 29, 2025 14:05

chore: update webui build output

2459115

Merge remote-tracking branch 'ServeurpersoCom/fix-thinking-blocks-wit…

e55f8ef

…h-quotes' into fix-thinking-block-issues

fix: ChatMessage story fix

00421d5

Merge pull request #2 from allozaur/fix-thinking-block-issues

5947c69

Fix Chat Message Storybook story

allozaur changed the title ~~Fix thinking blocks with quotes~~ Fix thinking blocks with quotes + add handling [THINK]...[/THINK] blocks Sep 29, 2025

allozaur merged commit 5f7e166 into ggml-org:master Sep 29, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix thinking blocks with quotes + add handling `[THINK]...[/THINK]` blocks #16326

Fix thinking blocks with quotes + add handling `[THINK]...[/THINK]` blocks #16326

ServeurpersoCom commented Sep 29, 2025 •

edited by allozaur

Loading

Uh oh!

ServeurpersoCom commented Sep 29, 2025

Uh oh!

allozaur commented Sep 29, 2025

Uh oh!

allozaur commented Sep 29, 2025 •

edited

Loading

Uh oh!

ServeurpersoCom commented Sep 29, 2025

Uh oh!

Uh oh!

vbooka1 commented Oct 1, 2025

Uh oh!

ServeurpersoCom commented Oct 1, 2025

Uh oh!

vbooka1 commented Oct 1, 2025

Uh oh!

allozaur commented Oct 1, 2025

Uh oh!

ServeurpersoCom commented Oct 1, 2025 •

edited

Loading

Uh oh!

ServeurpersoCom commented Oct 1, 2025 •

edited

Loading

Uh oh!

allozaur commented Oct 1, 2025

Uh oh!

Uh oh!

Fix thinking blocks with quotes + add handling [THINK]...[/THINK] blocks #16326

Fix thinking blocks with quotes + add handling [THINK]...[/THINK] blocks #16326

Conversation

ServeurpersoCom commented Sep 29, 2025 • edited by allozaur Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Sep 29, 2025

Uh oh!

allozaur commented Sep 29, 2025

Uh oh!

allozaur commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Sep 29, 2025

Uh oh!

Uh oh!

vbooka1 commented Oct 1, 2025

Uh oh!

ServeurpersoCom commented Oct 1, 2025

Uh oh!

vbooka1 commented Oct 1, 2025

Uh oh!

allozaur commented Oct 1, 2025

Uh oh!

ServeurpersoCom commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allozaur commented Oct 1, 2025

Uh oh!

Uh oh!

Fix thinking blocks with quotes + add handling `[THINK]...[/THINK]` blocks #16326

Fix thinking blocks with quotes + add handling `[THINK]...[/THINK]` blocks #16326

ServeurpersoCom commented Sep 29, 2025 •

edited by allozaur

Loading

allozaur commented Sep 29, 2025 •

edited

Loading

ServeurpersoCom commented Oct 1, 2025 •

edited

Loading

ServeurpersoCom commented Oct 1, 2025 •

edited

Loading