-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: ReActAgentWorker._arun_step_stream drops tokens that arrive before is_done=True #14307
Comments
The async for latest_chunk in chat_stream:
full_response = latest_chunk
is_done = self._infer_stream_chunk_is_final(latest_chunk)
if is_done:
break In this loop, the method iterates over the If you want to include intermediate tokens in the output, you can set the |
@dosu I don't think the code is functioning as intended since the tokens that are getting dropped are part of the actual final response. For instance, |
To ensure that the entire final response is captured without truncation when using the Here is the relevant part of the code that shows how the class ReActOutputParser(BaseOutputParser):
"""ReAct Output parser."""
def parse(self, output: str, is_streaming: bool = False) -> BaseReasoningStep:
"""Parse output from ReAct agent.
We expect the output to be in one of the following formats:
1. If the agent need to use a tool to answer the question:
```
Thought: <thought>
Action: <action>
Action Input: <action_input>
```
2. If the agent can answer the question without any tools:
```
Thought: <thought>
Answer: <answer>
```
"""
if "Thought:" not in output:
# NOTE: handle the case where the agent directly outputs the answer
# instead of following the thought-answer format
return ResponseReasoningStep(
thought="(Implicit) I can answer without any more tools!",
response=output,
is_streaming=is_streaming,
)
if "Answer:" in output:
thought, answer = extract_final_response(output)
return ResponseReasoningStep(
thought=thought, response=answer, is_streaming=is_streaming
)
if "Action:" in output:
return parse_action_reasoning_step(output)
raise ValueError(f"Could not parse output: {output}") By setting Additionally, you can refer to the test cases provided in the def test_stream_chat_basic(
add_tool: FunctionTool,
) -> None:
mock_llm = MockStreamChatLLM(
responses=[
ChatMessage(
content=MOCK_ACTION_RESPONSE,
role=MessageRole.ASSISTANT,
),
ChatMessage(
content=MOCK_STREAM_FINAL_RESPONSE,
role=MessageRole.ASSISTANT,
),
]
)
agent = ReActAgent.from_tools(
tools=[add_tool],
llm=mock_llm,
)
response = agent.stream_chat("What is 1 + 1?")
assert isinstance(response, StreamingAgentChatResponse)
# exhaust stream
for delta in response.response_gen:
continue
expected_answer = MOCK_STREAM_FINAL_RESPONSE.split("Answer: ")[-1].strip()
assert response.response == expected_answer
assert agent.chat_history == [
ChatMessage(
content="What is 1 + 1?",
role=MessageRole.USER,
),
ChatMessage(
content="2 is the final answer.",
role=MessageRole.ASSISTANT,
),
] This test ensures that the entire final response is captured correctly when using streaming [3][4]. |
@dosu that parameter is already being set True when you call |
To ensure that the entire final response is captured without truncation when using the async def _arun_step_stream(
self,
step: TaskStep,
task: Task,
) -> TaskStepOutput:
"""Run step."""
if step.input is not None:
add_user_step_to_reasoning(
step,
task.extra_state["new_memory"],
task.extra_state["current_reasoning"],
verbose=self._verbose,
)
tools = self.get_tools(task.input)
input_chat = self._react_chat_formatter.format(
tools,
chat_history=task.memory.get() + task.extra_state["new_memory"].get_all(),
current_reasoning=task.extra_state["current_reasoning"],
)
chat_stream = await self._llm.astream_chat(input_chat)
# iterate over stream, break out if is final answer after the "Answer: "
full_response = ChatResponse(
message=ChatMessage(content=None, role="assistant")
)
is_done = False
async for latest_chunk in chat_stream:
full_response = latest_chunk
is_done = self._infer_stream_chunk_is_final(latest_chunk)
if is_done:
break
if not is_done:
reasoning_steps, is_done = await self._aprocess_actions(
task, tools=tools, output=full_response, is_streaming=True
)
task.extra_state["current_reasoning"].extend(reasoning_steps)
agent_response: AGENT_CHAT_RESPONSE_TYPE = self._get_response(
task.extra_state["current_reasoning"], task.extra_state["sources"]
)
else:
response_stream = self._async_add_back_chunk_to_stream(
chunk=latest_chunk, chat_stream=chat_stream
) In this code, the |
@Tshimanga don't we want to drop those tokens? The idea is the stream shouldn't include all the agents thoughts, just the final response |
@logan-markewich What you say makes sense, but it looks like I came to premature conclusions when debugging. The issue I've been encountering is that too much of the response gets dropped. I'm ending up with sentences like (as a response to the reproducer above): " beach is a magical place where golden sands stretch underfoot and the ocean whispers in a constant, soothing murmur. It's an ideal spot for families and friends to gather, share moments, and indulge in the simple joys of nature, from watching the sunset to collecting seashells along the shore." Where the "The" got dropped. When I was initially stepping through the debugger and testing fixes I didn't see any preceding text for the thought process in |
@logan-markewich ah ok, I think I found the actual issue. In .
.
.
if len(latest_content) > len("Thought") and not latest_content.startswith(
"Thought"
):
return True
.
.
. If the the LLM doesn't follow thought-action format, but the beginning of the response is a short token like "The" then Does this seems like more so the actual issue here? |
Bug Description
In ReActAgentWorker._arun_step_stream, any tokens generated before the is_done condition triggers are dropped.
Version
0.10.38 thru 0.10.46
Steps to Reproduce
Relevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered: