You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Along with this issue: #251, it makes you reaching console to restart it constantly if you hit the available pages, or it just hangs if you switch model...
Solution
As title.
Acknowledgements
I have looked for similar requests before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will make my requests politely.
The text was updated successfully, but these errors were encountered:
This has previously been discussed that the client would be responsible for managing the length of the request (the tokenization endpoint was designed for this purpose) rather than letting tabby decide how to truncate the prompt. The rationale here is that the client would better understand what portions of the prompt are the least important and should be dropped when the request becomes too long.
For chat completions, in theory tabby could try to "smart" truncate beginning with messages right after the system prompt if it is present - although this may still cause a problem for certain prompt formats with role order restrictions such as Mistral's (in Mistral prompt format, it wouldn't work to have the first message after the system prompt be an assistant role message if the first user message is dropped for length). But there also isn't really a "smart" way to do it for raw completions, and the beginning of the prompt would need to be dropped - so this is far better left up to the client (i.e. a frontend like SillyTavern that pre-formats a prompt and sends the request as a raw completion).
I was told by users of official OAI API that it is the behavior of the official API as well to simply error if the client sends a request that is too long. If this is inaccurate or has subsequently changed, then potentially auto-truncation could be considered as a feature, although there are several problems as mentioned above.
Not what I mean. I'm only saying if prompt PLUS max_tokens exceeds max pages then only gengerate up to (max_pages - prompts), if prompts too long just error. Not truncating anything.
Well you have two options here. You could either have the frontend take into account max_tokens and subtract that from the available sequence length (the recommended method), or you could just not pass max_tokens at all, which defaults to the maximum that will fit.
Problem
Along with this issue: #251, it makes you reaching console to restart it constantly if you hit the available pages, or it just hangs if you switch model...
Solution
As title.
Acknowledgements
The text was updated successfully, but these errors were encountered: