[REQUEST] Don't error when max_tokens request are too long causing Job required pages too small, just generate up to the available pages. #262

Originalimoc · 2024-12-15T09:30:47Z

Problem

Along with this issue: #251, it makes you reaching console to restart it constantly if you hit the available pages, or it just hangs if you switch model...

Solution

As title.

Acknowledgements

I have looked for similar requests before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will make my requests politely.

DocShotgun · 2024-12-15T15:39:00Z

This has previously been discussed that the client would be responsible for managing the length of the request (the tokenization endpoint was designed for this purpose) rather than letting tabby decide how to truncate the prompt. The rationale here is that the client would better understand what portions of the prompt are the least important and should be dropped when the request becomes too long.

For chat completions, in theory tabby could try to "smart" truncate beginning with messages right after the system prompt if it is present - although this may still cause a problem for certain prompt formats with role order restrictions such as Mistral's (in Mistral prompt format, it wouldn't work to have the first message after the system prompt be an assistant role message if the first user message is dropped for length). But there also isn't really a "smart" way to do it for raw completions, and the beginning of the prompt would need to be dropped - so this is far better left up to the client (i.e. a frontend like SillyTavern that pre-formats a prompt and sends the request as a raw completion).

I was told by users of official OAI API that it is the behavior of the official API as well to simply error if the client sends a request that is too long. If this is inaccurate or has subsequently changed, then potentially auto-truncation could be considered as a feature, although there are several problems as mentioned above.

Originalimoc · 2024-12-17T03:03:41Z

Not what I mean. I'm only saying if prompt PLUS max_tokens exceeds max pages then only gengerate up to (max_pages - prompts), if prompts too long just error. Not truncating anything.

DocShotgun · 2024-12-17T04:23:39Z

Well you have two options here. You could either have the frontend take into account max_tokens and subtract that from the available sequence length (the recommended method), or you could just not pass max_tokens at all, which defaults to the maximum that will fit.

Originalimoc · 2024-12-18T17:32:20Z

Isn't there a default? 250 or 150..?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQUEST] Don't error when max_tokens request are too long causing Job required pages too small, just generate up to the available pages. #262

[REQUEST] Don't error when max_tokens request are too long causing Job required pages too small, just generate up to the available pages. #262

Originalimoc commented Dec 15, 2024 •

edited

Loading

DocShotgun commented Dec 15, 2024 •

edited

Loading

Originalimoc commented Dec 17, 2024

DocShotgun commented Dec 17, 2024

Originalimoc commented Dec 18, 2024

[REQUEST] Don't error when max_tokens request are too long causing Job required pages too small, just generate up to the available pages. #262

[REQUEST] Don't error when max_tokens request are too long causing Job required pages too small, just generate up to the available pages. #262

Comments

Originalimoc commented Dec 15, 2024 • edited Loading

Problem

Solution

Acknowledgements

DocShotgun commented Dec 15, 2024 • edited Loading

Originalimoc commented Dec 17, 2024

DocShotgun commented Dec 17, 2024

Originalimoc commented Dec 18, 2024

Originalimoc commented Dec 15, 2024 •

edited

Loading

DocShotgun commented Dec 15, 2024 •

edited

Loading