Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server : set default top-k to 1 in the web ui #10935

Closed
wants to merge 1 commit into from

Conversation

ggerganov
Copy link
Owner

It's better to default to greedy sampling in the web ui, since it is compatible with speculative decoding and has significantly more practical applications compared to other sampling configurations.

@ngxson
Copy link
Collaborator

ngxson commented Dec 21, 2024

IMO having greedy sampling enabled will be quite confused for new users who don't know much about inner work of llama.cpp.

By default, users expect the response to be a bit non-deterministic, so the Regenerate button works like what it does on chatgpt.

@ngxson
Copy link
Collaborator

ngxson commented Dec 21, 2024

For speculative decoding, we can also detect if it's enabled, then set top_k=1 accordingly (otherwise default to top_k=40). I can add this mechanism if it's needed.

@ggerganov
Copy link
Owner Author

For speculative decoding, we can also detect if it's enabled, then set top_k=1 accordingly (otherwise default to top_k=40). I can add this mechanism if it's needed.

It's not a good solution because speculative decoding can be made to work with non-greedy sampling in the future.

IMO having the best response quality should be the default option. For me, ChatGPT is quite confusing because I cannot make it deterministic. It's better to have the highest-quality deterministic result and only choose to degrade it if the user really knows and wants to do so.

@ngxson
Copy link
Collaborator

ngxson commented Dec 21, 2024

I agree that we should provide the best quality response by default, but the problem is that top_k=1 only provide good quality for code generation, but will be quite useless if user want to use the LLM to do writing tasks (which I use quite a lot in my daily life, to rephrase my text in different ways).

Having top_k=1 will also make the Regenerate button less useful (as I said earlier). An idea could be to increase top_k whenever user click on that button.

Another option could be to provide some presets when creating new conversation, for example Bing has:

image

But the downside is that this applies per-conversation, which requires a bit more work on our side.

@@ -55,7 +55,7 @@ const CONFIG_DEFAULT = {
temperature: 0.8,
dynatemp_range: 0.0,
dynatemp_exponent: 1.0,
top_k: 40,
top_k: 1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is not in sync with common.h anymore, I think we should add a comment too. Just in case in the future we may want to pull these default values from /props endpoint

@slaren
Copy link
Collaborator

slaren commented Dec 21, 2024

I agree with @ngxson. IMO the applications for greedy sampling are very limited, and it is not what most people expect from a chat bot. I don't think this change is compatible with the goal of making the web UI a user friendly interface that everybody can use without having a deep knowledge of the way LLMs work.

@ggerganov
Copy link
Owner Author

Ok, no problem. The settings persist in the browser cache, so I can adjust them for my needs.

@ggerganov ggerganov closed this Dec 21, 2024
@ggerganov ggerganov deleted the gg/webui-topk-1 branch December 21, 2024 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants