fix(llm): prevent base llm flow truncation (#522) #659
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes: #522
Previously,
BaseLlmFlow.run_async
would loop indefinitely whenever the LLM response was truncated due to hitting itsmax_tokens
limit. The loop’s break condition only checked for “no events” orlast_event.is_final_response()
, neither of which fires for a partial/truncated chunk—so the generator re-invoked_run_one_step_async
endlessly.With this change, we:
or last_event.partial
to the break condition inrun_async
.partial=True
),run_async
yields it exactly once and then terminates, avoiding infinite loops.Test Plan
test_base_llm_flow_truncation.py
verifies that:run_async
emits exactly one event and then stops.