Skip to content

Conversation

ServeurpersoCom
Copy link
Collaborator

@ServeurpersoCom ServeurpersoCom commented Sep 29, 2025

Close #16158
Close #16299

@ServeurpersoCom
Copy link
Collaborator Author

Solve #16299

@allozaur allozaur added server/webui bugfix fixes an issue or bug labels Sep 29, 2025
@allozaur
Copy link
Collaborator

@ServeurpersoCom i've added fix for #16158 in ServeurpersoCom/pull/1 to your branch. Please review it and if all is good on your end, please merge it so that we can use this PR for solving both issues at once.

@allozaur
Copy link
Collaborator

allozaur commented Sep 29, 2025

@ServeurpersoCom CI has failed for Storybook tests, here's a fix — ServeurpersoCom#2

@ServeurpersoCom
Copy link
Collaborator Author

I've tested the changes across multiple models, including GPT-OSS, Seed-OSS, Qwen3*Thinking, and the Llama-3.3-Nemotron-Super-49B. I didn't observe any regressions or CoT (Chain-of-Thought) rendering issues : everything looks stable.

@allozaur allozaur changed the title Fix thinking blocks with quotes Fix thinking blocks with quotes + add handling [THINK]...[/THINK] blocks Sep 29, 2025
@allozaur allozaur merged commit 5f7e166 into ggml-org:master Sep 29, 2025
14 checks passed
@vbooka1
Copy link

vbooka1 commented Oct 1, 2025

This patch did not fix it for me, b6653 still does not show the thought process.

I run GLM 4.5, "unsloth dynamic quant", tried with --jinja and without it, tried checking and unchecking the "Show thought in progress" checkbox, nothing helped, the chat window shows only "Processing..." and context token counter until the model starts outputting the final result.

@ServeurpersoCom
Copy link
Collaborator Author

This patch did not fix it for me, b6653 still does not show the thought process.

I run GLM 4.5, "unsloth dynamic quant", tried with --jinja and without it, tried checking and unchecking the "Show thought in progress" checkbox, nothing helped, the chat window shows only "Processing..." and context token counter until the model starts outputting the final result.

I already have a test instance running with this model (Same for GLM 4.5 Air from Unsloth), and I can reproduce the same issue. I’ll take a closer look as soon as possible.

@vbooka1
Copy link

vbooka1 commented Oct 1, 2025

This patch did not fix it for me, b6653 still does not show the thought process.
I run GLM 4.5, "unsloth dynamic quant", tried with --jinja and without it, tried checking and unchecking the "Show thought in progress" checkbox, nothing helped, the chat window shows only "Processing..." and context token counter until the model starts outputting the final result.

I already have a test instance running with this model (Same for GLM 4.5 Air from Unsloth), and I can reproduce the same issue. I’ll take a closer look as soon as possible.

it seems that GLM chat template is broken, because with Qwen 2.5 the thought process is displayed (although "◁think▷" tags are not stripped)

Qwen 2.5 72B:
qwen2 5

GLM 4.5 355B:
glm4 5

@allozaur
Copy link
Collaborator

allozaur commented Oct 1, 2025

This patch did not fix it for me, b6653 still does not show the thought process.
I run GLM 4.5, "unsloth dynamic quant", tried with --jinja and without it, tried checking and unchecking the "Show thought in progress" checkbox, nothing helped, the chat window shows only "Processing..." and context token counter until the model starts outputting the final result.

I already have a test instance running with this model (Same for GLM 4.5 Air from Unsloth), and I can reproduce the same issue. I’ll take a closer look as soon as possible.

it seems that GLM chat template is broken, because with Qwen 2.5 the thought process is displayed (although "◁think▷" tags are not stripped)

Qwen 2.5 72B: qwen2 5

GLM 4.5 355B: glm4 5

#16364 addresses the ◁think▷ tags

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Oct 1, 2025

curl -s .../ia/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "MoE-GLM-4.5-Air-106B",
    "messages": [
      {"role": "user", "content": "Bonjour le monde"}
    ]
  }'
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"\n<think>Nous avons reçu un message en français : \"Bonjour le monde\". C'est la traduction de \"Hello World\", un programme classique pour les débutants en programmation.\n L'utilisateur pourrait vouloir un programme qui affiche \"Bonjour le monde\" dans un certain contexte, peut-être en Python ou un autre langage.\n Cependant, la demande est très simple et ne précise pas de langage. Puisque l'utilisateur a écrit en français, nous pouvons répondre en français.\n Nous allons supposer que l'utilisateur veut un exemple de code en Python, car c'est un langage courant pour les débutants.\n Nous pourrions fournir un code simple en Python pour afficher \"Bonjour le monde\".</think>Bonjour ! 😊 Voici un programme simple en Python pour afficher \"Bonjour le monde\" :\n\n```python\n# Programme classique \"Hello World\" en français\nprint(\"Bonjour le monde\")\n```\n\n### Explication :\n- La fonction `print()` envoie le texte à l'écran.\n- Le texte `\"Bonjour le monde\"` est mis entre guillemets pour être traité comme une chaîne de caractères.\n\n### Résultat à l'exécution :\n```\nBonjour le monde\n```\n\n### Autres langages (pour référence) :\n**JavaScript** :\n```javascript\nconsole.log(\"Bonjour le monde\");\n```\n\n**HTML** :\n```html\n<!DOCTYPE html>\n<html>\n<body>\n  <h1>Bonjour le monde</h1>\n</body>\n</html>\n```\n\n**C++** :\n```cpp\n#include <iostream>\nint main() {\n    std::cout << \"Bonjour le monde\" << std::endl;\n    return 0;\n}\n```\n\nBesoin d'autres détails ou d'exemples dans un autre langage ? 😊"}}],"created":1759329471,"model":"MoE-GLM-4.5-Air-106B","system_fingerprint":"b6658-2a9b6338","object":"chat.completion","usage":{"completion_tokens":370,"prompt_tokens":8,"total_tokens":378},"id":"chatcmpl-DFZD1Qmk4Ora8oU3JOEJeNiWhsipiRNw","timings":{"cache_n":2,"prompt_n":6,"prompt_ms":176.336,"prompt_per_token_ms":29.389333333333337,"prompt_per_second":34.02595045821613,"predicted_n":370,"predicted_ms":18191.783,"predicted_per_token_ms":49.16698108108108,"predicted_per_second":20.338852986537933}}

Oh, so it’s just the missing line break for <think>...! 😲
The code should be robust enough to handle both cases: <think> inline and <think>\n....

@ServeurpersoCom
Copy link
Collaborator Author

ServeurpersoCom commented Oct 1, 2025

Tested with GLM-4.5 inline <think> streaming (no line breaks, unlike Qwen).
The patch correctly captures reasoning content and keeps the assistant message clean:

Sans titre

It still needs review and no-regression testing across all models:
64156f5

(I set up a dedicated machine with available for live WebUI tests
https://www.serveurperso.com/ia/ there is GLM 4.5 Air)

@allozaur
Copy link
Collaborator

allozaur commented Oct 1, 2025

Tested with GLM-4.5 inline streaming (no line breaks, unlike Qwen). The patch correctly captures reasoning content and keeps the assistant message clean:
Sans titre

It still needs review and no-regression testing across all models: 64156f5

(I set up a dedicated machine with available for live WebUI tests https://www.serveurperso.com/ia/ there is GLM 4.5 Air)

I've merged this commit into #16364. @ServeurpersoCom lemme know once you've tested this thoroughly and if I can be of any more help with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Misc. bug: WebUI <think> block content disappears when it contains quotes Misc. bug: [THINK][/THINK] not rendered as reasoning in the new web ui
3 participants