Skip to content

Commit

Permalink
Model: Skip empty token chunks
Browse files Browse the repository at this point in the history
This helps make the generation loop more efficient by skipping past
chunks that aren't providing any tokens anyways. The offset isn't
affected.

Signed-off-by: kingbri <[email protected]>
  • Loading branch information
kingbri1 committed Jul 22, 2024
1 parent 0eedc8c commit 21516bd
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions backends/exllamav2/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -1185,13 +1185,15 @@ async def generate_gen(
result_id = result.get("identifier")

if stage == "streaming" and result_id == job_id:
chunk = unwrap(result.get("text"), "")
full_response += chunk

chunk_tokens = result.get("token_ids")
if chunk_tokens is not None:
if chunk_tokens is None:
continue
else:
generated_tokens += chunk_tokens.size(dim=0)

chunk = unwrap(result.get("text"), "")
full_response += chunk

generation = {
"text": chunk,
"prompt_tokens": context_len,
Expand Down

0 comments on commit 21516bd

Please sign in to comment.