-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server : set default top-k to 1 in the web ui #10935
Conversation
IMO having greedy sampling enabled will be quite confused for new users who don't know much about inner work of llama.cpp. By default, users expect the response to be a bit non-deterministic, so the |
For speculative decoding, we can also detect if it's enabled, then set top_k=1 accordingly (otherwise default to top_k=40). I can add this mechanism if it's needed. |
It's not a good solution because speculative decoding can be made to work with non-greedy sampling in the future. IMO having the best response quality should be the default option. For me, ChatGPT is quite confusing because I cannot make it deterministic. It's better to have the highest-quality deterministic result and only choose to degrade it if the user really knows and wants to do so. |
@@ -55,7 +55,7 @@ const CONFIG_DEFAULT = { | |||
temperature: 0.8, | |||
dynatemp_range: 0.0, | |||
dynatemp_exponent: 1.0, | |||
top_k: 40, | |||
top_k: 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is not in sync with common.h
anymore, I think we should add a comment too. Just in case in the future we may want to pull these default values from /props
endpoint
I agree with @ngxson. IMO the applications for greedy sampling are very limited, and it is not what most people expect from a chat bot. I don't think this change is compatible with the goal of making the web UI a user friendly interface that everybody can use without having a deep knowledge of the way LLMs work. |
Ok, no problem. The settings persist in the browser cache, so I can adjust them for my needs. |
It's better to default to greedy sampling in the web ui, since it is compatible with speculative decoding and has significantly more practical applications compared to other sampling configurations.