-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logprobs view – show even when aborted #1195
Comments
Yes, the logprobs are not actually sent during streaming. Kobold Lite does a separate request for them after streaming ends. if you abort the stream, this will not be done. In order for
Yes, this is due to samplers. Try setting it to a more creative one (e.g. top_p = 1, top_k = 300, temp=1) and you should see plenty of alternatives. Right now, its limited to top 5 probabilities, this is not currently configurable. top_k = 1, by definition, drops everything except the top token. So you will only see 1 option, because everything has been discarded. It's a truncation sampler. |
How about outputting RAW logits values before sampling (for a configurable amount, like yours 5) separately? This way, the client can join the probabilities map (token→percent) with logits array (pairs of token+logit) to show the real effect of the sampler:
Does it sound good to you? This would allow exploring the effect of samplers like XTC which discards from the head and not the tail. Also, I didn't understand why logits are lost when aborting, if you have a separate endpoint to get them? Is it an upstream limitation? |
The logits are not "lost" per se, but they will be overwritten in the next request. You can still fetch them from the dedicated The unmodified logits are not stored anywhere currently, since the probability calculation and storage only happens after sampling is complete. |
So-o-o? Could Lite still (try to) fetch them even after abort? I just want this feature to be as useful as I imagined it to be! |
Also: can you keep the previous logprobs list while a new request is being generated? I mean, in Lite: always download the table just after the generation finishes – and keep it available until the next generation finishes. So the user could look at the table while the next prompt (or the same one, if Retry was pressed) is being generated in the background? The workaround is to always duplicate the tab between turns… |
Hmm the reason why I clear it on start of generation instead of at the end, is to deal with all the edge cases where generation fails, gets aborted, gets stalled, logprobs failed to return etc, I don't want cases where the logprobs shown are for the wrong/incomplete generation. Thus I clear it when a new generation starts. |
I'm playing with the new logprobs functionality. I see in the 1.77 release it is mentioned that they won't work with streaming. In fact, the link
(View Logprobs)
correctly opens the pop-up with the table, but is completely absent if I'd click[ABORT]
during any generation.Since I want to explore model's behavior against sampling settings and chosen tokens, I have to abort and
Retry
many times in a row!The only possible option is to set a very low amount of tokens to generate (like the min default of 8 or set it even less) to be able to see the resulting logprobs.
Opened
/api/extra/last_logprobs
in another tab always gives melogprobs: null
no matter streaming is enabled or not, is this normal?The second question: for each row of the table, I can see percentage values in
XX:XX%
, but most of the rightmost cells are empty. Are they empty because of sampler settings (for example, they got discarded by min_p/top_p/top_k) or because the absolute value is too low?Why there are only 6 (or 5?) columns?
Can you make the number of columns configurable (in the server initialization config is fine), defaulting to 6 ? Can you paint near-zero ones in gray instead of omiting them?
Also, is this technically possible to still print zero logprobs for tokens that got dropped by samplers, but in their correct order?
For example, with
top_k=1
I always get 100% for everything, and nothing else. Can you "not drop" the tail, and output it with 0.00% but with actual tokens that were there?The text was updated successfully, but these errors were encountered: