server : set default top-k to 1 in the web ui #10935

ggerganov · 2024-12-21T10:22:36Z

It's better to default to greedy sampling in the web ui, since it is compatible with speculative decoding and has significantly more practical applications compared to other sampling configurations.

ngxson · 2024-12-21T10:27:39Z

IMO having greedy sampling enabled will be quite confused for new users who don't know much about inner work of llama.cpp.

By default, users expect the response to be a bit non-deterministic, so the Regenerate button works like what it does on chatgpt.

ngxson · 2024-12-21T10:29:48Z

For speculative decoding, we can also detect if it's enabled, then set top_k=1 accordingly (otherwise default to top_k=40). I can add this mechanism if it's needed.

ggerganov · 2024-12-21T10:52:53Z

For speculative decoding, we can also detect if it's enabled, then set top_k=1 accordingly (otherwise default to top_k=40). I can add this mechanism if it's needed.

It's not a good solution because speculative decoding can be made to work with non-greedy sampling in the future.

IMO having the best response quality should be the default option. For me, ChatGPT is quite confusing because I cannot make it deterministic. It's better to have the highest-quality deterministic result and only choose to degrade it if the user really knows and wants to do so.

ngxson · 2024-12-21T12:37:29Z

I agree that we should provide the best quality response by default, but the problem is that top_k=1 only provide good quality for code generation, but will be quite useless if user want to use the LLM to do writing tasks (which I use quite a lot in my daily life, to rephrase my text in different ways).

Having top_k=1 will also make the Regenerate button less useful (as I said earlier). An idea could be to increase top_k whenever user click on that button.

Another option could be to provide some presets when creating new conversation, for example Bing has:

But the downside is that this applies per-conversation, which requires a bit more work on our side.

ngxson · 2024-12-21T12:40:13Z

examples/server/webui/src/main.js

@@ -55,7 +55,7 @@ const CONFIG_DEFAULT = {
  temperature: 0.8,
  dynatemp_range: 0.0,
  dynatemp_exponent: 1.0,
-  top_k: 40,
+  top_k: 1,


Since this is not in sync with common.h anymore, I think we should add a comment too. Just in case in the future we may want to pull these default values from /props endpoint

slaren · 2024-12-21T14:03:00Z

I agree with @ngxson. IMO the applications for greedy sampling are very limited, and it is not what most people expect from a chat bot. I don't think this change is compatible with the goal of making the web UI a user friendly interface that everybody can use without having a deep knowledge of the way LLMs work.

ggerganov · 2024-12-21T14:50:57Z

Ok, no problem. The settings persist in the browser cache, so I can adjust them for my needs.

server : set default top-k to 1 in the web ui

d02e63b

ggerganov requested a review from ngxson as a code owner December 21, 2024 10:22

github-actions bot added examples server labels Dec 21, 2024

ngxson reviewed Dec 21, 2024

View reviewed changes

ggerganov closed this Dec 21, 2024

ggerganov deleted the gg/webui-topk-1 branch December 21, 2024 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : set default top-k to 1 in the web ui #10935

server : set default top-k to 1 in the web ui #10935

ggerganov commented Dec 21, 2024

ngxson commented Dec 21, 2024

ngxson commented Dec 21, 2024

ggerganov commented Dec 21, 2024

ngxson commented Dec 21, 2024

ngxson Dec 21, 2024

slaren commented Dec 21, 2024

ggerganov commented Dec 21, 2024

server : set default top-k to 1 in the web ui #10935

server : set default top-k to 1 in the web ui #10935

Conversation

ggerganov commented Dec 21, 2024

ngxson commented Dec 21, 2024

ngxson commented Dec 21, 2024

ggerganov commented Dec 21, 2024

ngxson commented Dec 21, 2024

ngxson Dec 21, 2024

Choose a reason for hiding this comment

slaren commented Dec 21, 2024

ggerganov commented Dec 21, 2024