Skip to content

Commit

Permalink
Model: Dynamically scale generate_window
Browse files Browse the repository at this point in the history
Allows for adjustment of reservation space at the end of the context
before rolling it. This should be scaled as a model's max_seq_len
goes up.

Signed-off-by: kingbri <[email protected]>
  • Loading branch information
bdashore3 committed Jan 24, 2024
1 parent a9a128c commit 243acfe
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion backends/exllamav2/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -554,7 +554,9 @@ def generate_gen(self, prompt: str, **kwargs):
token_healing = unwrap(kwargs.get("token_healing"), False)
max_tokens = unwrap(kwargs.get("max_tokens"), 150)
stream_interval = unwrap(kwargs.get("stream_interval"), 0)
generate_window = min(unwrap(kwargs.get("generate_window"), 512), max_tokens)
generate_window = max(
unwrap(kwargs.get("generate_window"), 512), max_tokens // 8
)

# Sampler settings
gen_settings = ExLlamaV2Sampler.Settings()
Expand Down

0 comments on commit 243acfe

Please sign in to comment.