-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Open
Labels
Description
Please read this first
- Have you read the docs?Agents SDK docs - Yes
- Have you searched for related issues? Others may have faced similar issues. - Yes
Describe the bug
ReasoningItems events are getting emitted out of order. For example, we get Tool Call Initiated
, Reasoning Item
, Tool Call Output
which is not in line with the spec.
Debug information
- Agents SDK version: (e.g.
v0.0.3
) - Python version (e.g. Python 3.10)
Repro steps
Take the stock streaming example script from the library and add some additional print statements.
import asyncio
import random
from agents import Agent, ItemHelpers, ModelSettings, Runner, function_tool
from openai.types.shared import Reasoning
@function_tool
def how_many_jokes() -> int:
"""Return a random integer of jokes to tell between 1 and 10 (inclusive)."""
return random.randint(1, 10)
async def main():
agent = Agent(
name="Joker",
model="gpt-5",
model_settings=ModelSettings(
reasoning=Reasoning(
effort="high",
summary="auto"
)
),
instructions="First call the `how_many_jokes` tool, then tell that many jokes.",
tools=[how_many_jokes],
)
result = Runner.run_streamed(
agent,
input="Hello"
)
print("\n" + "="*60)
print("🚀 AGENT STREAM STARTING")
print("="*60 + "\n")
previous_data_type = None
reasoning_buffer = []
async for event in result.stream_events():
# Handle raw response events with progress indicators
if event.type == "raw_response_event":
data_type = event.data.type
# Special handling for different event types
if data_type == "reasoning":
if data_type != previous_data_type:
print("🧠 [REASONING] ", end='', flush=True)
print('▸', end='', flush=True)
elif data_type == "tool_calls":
if data_type != previous_data_type:
print("\n🔧 [TOOL CALLS] ", end='', flush=True)
print('▸', end='', flush=True)
elif data_type == "content":
if data_type != previous_data_type:
print("\n💬 [CONTENT] ", end='', flush=True)
print('▸', end='', flush=True)
elif "response.output_item.done" in data_type.lower():
if data_type != previous_data_type:
print(f"\n✅ [RESPONSE.OUTPUT_ITEM.DONE] ", end='', flush=True)
# Try to extract what kind of output item from the event
if hasattr(event, 'data') and hasattr(event.data, 'item'):
item_type = getattr(event.data.item, 'type', 'unknown')
print(f"({item_type}) ", end='', flush=True)
print('▸', end='', flush=True)
else:
if data_type != previous_data_type:
print(f"\n📊 [{data_type.upper()}] ", end='', flush=True)
print('▸', end='', flush=True)
previous_data_type = data_type
continue
elif event.type == "agent_updated_stream_event":
print(f"\n\n✅ Agent Updated: {event.new_agent.name}")
print("-" * 40)
continue
elif event.type == "run_item_stream_event":
if event.item.type == "tool_call_item":
print(f"\n\n🛠️ TOOL CALL INITIATED")
print(" └─ Function: ", end='')
if hasattr(event.item, 'function_call'):
print(f"{event.item.function_call.name}")
if hasattr(event.item.function_call, 'arguments'):
print(f" └─ Arguments: {event.item.function_call.arguments}")
else:
print("(details pending)")
elif event.item.type == "tool_call_output_item":
print(f"\n📤 TOOL OUTPUT")
print(f" └─ Result: {event.item.output}")
elif event.item.type == "message_output_item":
message_text = ItemHelpers.text_message_output(event.item)
print(f"\n\n📝 MESSAGE OUTPUT")
print(" " + "─" * 37)
for line in message_text.split('\n'):
print(f" {line}")
print(" " + "─" * 37)
elif event.item.type == "reasoning_output_item":
print(f"\n\n🤔 REASONING OUTPUT")
if hasattr(event.item, 'reasoning'):
print(f" └─ {event.item.reasoning}")
else:
print(" └─ (reasoning content)")
else:
print(f"\n\n⚡ EVENT: {event.item.type}")
if hasattr(event.item, '__dict__'):
for key, value in event.item.__dict__.items():
if not key.startswith('_'):
print(f" └─ {key}: {value}")
# Handle response output item done events
elif event.type == "response_output_item_done" or "output_item_done" in str(event.type):
# Try to determine the item type from various possible locations
item_type = 'unknown'
if hasattr(event, 'item') and hasattr(event.item, 'type'):
item_type = event.item.type
elif hasattr(event, 'data') and hasattr(event.data, 'item') and hasattr(event.data.item, 'type'):
item_type = event.data.item.type
elif hasattr(event, 'output_item') and hasattr(event.output_item, 'type'):
item_type = event.output_item.type
print(f"\n✔️ OUTPUT ITEM COMPLETE: {item_type}")
# Provide specific details based on the output item type
if "message" in item_type.lower():
print(" └─ Message delivery completed")
elif "tool" in item_type.lower() and "output" in item_type.lower():
print(" └─ Tool execution result delivered")
elif "tool" in item_type.lower() and "call" in item_type.lower():
print(" └─ Tool call completed")
elif "reasoning" in item_type.lower():
print(" └─ Reasoning step completed")
else:
print(f" └─ {item_type} completed")
# Handle other event types
else:
print(f"\n📌 {event.type.upper()}")
if hasattr(event, '__dict__'):
for key, value in event.__dict__.items():
if not key.startswith('_') and key != 'type':
print(f" └─ {key}: {str(value)[:100]}") # Truncate long values
print("\n\n" + "="*60)
print("✨ AGENT STREAM COMPLETE")
print("="*60 + "\n")
if __name__ == "__main__":
asyncio.run(main())
# Example output:
#
# ============================================================
# 🚀 AGENT STREAM STARTING
# ============================================================
#
# 🧠 [REASONING] ▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸
# 🔧 [TOOL CALLS] ▸▸▸▸▸▸
#
# ✅ Agent Updated: Joker
# ----------------------------------------
#
# 🛠️ TOOL CALL INITIATED
# └─ Function: how_many_jokes
# └─ Arguments: {}
#
# 📤 TOOL OUTPUT
# └─ Result: 4
#
# 💬 [CONTENT] ▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸
#
# 📝 MESSAGE OUTPUT
# ─────────────────────────────────────
# Sure, here are four jokes for you:
#
# 1. **Why don't skeletons fight each other?**
# They don't have the guts!
#
# 2. **What do you call fake spaghetti?**
# An impasta!
#
# 3. **Why did the scarecrow win an award?**
# Because he was outstanding in his field!
#
# 4. **Why did the bicycle fall over?**
# Because it was two-tired!
# ─────────────────────────────────────
#
# ============================================================
# ✨ AGENT STREAM COMPLETE
# ============================================================
This is the output:
============================================================
🚀 AGENT STREAM STARTING
============================================================
✅ Agent Updated: Joker
----------------------------------------
📊 [RESPONSE.CREATED] ▸
📊 [RESPONSE.IN_PROGRESS] ▸
📊 [RESPONSE.OUTPUT_ITEM.ADDED] ▸
📊 [RESPONSE.REASONING_SUMMARY_PART.ADDED] ▸
📊 [RESPONSE.REASONING_SUMMARY_TEXT.DELTA] ▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸
📊 [RESPONSE.REASONING_SUMMARY_TEXT.DONE] ▸
📊 [RESPONSE.REASONING_SUMMARY_PART.DONE] ▸
📊 [RESPONSE.REASONING_SUMMARY_PART.ADDED] ▸
📊 [RESPONSE.REASONING_SUMMARY_TEXT.DELTA] ▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸
📊 [RESPONSE.REASONING_SUMMARY_TEXT.DONE] ▸
📊 [RESPONSE.REASONING_SUMMARY_PART.DONE] ▸
✅ [RESPONSE.OUTPUT_ITEM.DONE] (reasoning) ▸
📊 [RESPONSE.OUTPUT_ITEM.ADDED] ▸
📊 [RESPONSE.FUNCTION_CALL_ARGUMENTS.DELTA] ▸
📊 [RESPONSE.FUNCTION_CALL_ARGUMENTS.DONE] ▸
🛠️ TOOL CALL INITIATED
└─ Function: (details pending)
✅ [RESPONSE.OUTPUT_ITEM.DONE] (function_call) ▸
📊 [RESPONSE.COMPLETED] ▸
⚡ EVENT: reasoning_item
└─ agent: Agent(name='Joker', handoff_description=None, tools=[FunctionTool(name='how_many_jokes', description='Return a random integer of jokes to tell between 1 and 10 (inclusive).', params_json_schema={'properties': {}, 'title': 'how_many_jokes_args', 'type': 'object', 'additionalProperties': False, 'required': []}, on_invoke_tool=<function function_tool.<locals>._create_function_tool.<locals>._on_invoke_tool at 0x10f7b2f20>, strict_json_schema=True, is_enabled=True)], mcp_servers=[], mcp_config={}, instructions='First call the `how_many_jokes` tool, then tell that many jokes.', prompt=None, handoffs=[], model='gpt-5', model_settings=ModelSettings(temperature=None, top_p=None, frequency_penalty=None, presence_penalty=None, tool_choice=None, parallel_tool_calls=None, truncation=None, max_tokens=None, reasoning=Reasoning(effort='high', generate_summary=None, summary='auto'), verbosity=None, metadata=None, store=None, include_usage=None, response_include=None, top_logprobs=None, extra_query=None, extra_body=None, extra_headers=None, extra_args=None), input_guardrails=[], output_guardrails=[], output_type=None, hooks=None, tool_use_behavior='run_llm_again', reset_tool_choice=True)
└─ raw_item: ResponseReasoningItem(id='rs_68cb60cf85f88190b23bba85c89d5f0903d11b9d3562e6b1', summary=[Summary(text='**Planning to tell jokes**\n\nI need to follow the developer’s instruction: first, I should call the how_many_jokes tool, then share that many jokes after getting a random number between 1 and 10. The user greeted me with "Hello," but my focus is on executing the joke request properly. I need to avoid heavy formatting and just provide short, appropriate jokes. So, I’ll call the tool now and keep the jokes general and light-hearted!', type='summary_text'), Summary(text='**Preparing for the joke call**\n\nThe tool is set to return a number based on the instructions, which says it returns a random integer between 1 and 10. Once I get that number, I’ll parse the result to determine how many jokes to tell. I think it’s crucial to queue the call properly, so I’m ready to proceed with the tool now. This way, I can ensure everything runs smoothly for sharing those jokes!', type='summary_text')], type='reasoning', content=None, encrypted_content=None, status=None)
└─ type: reasoning_item
📤 TOOL OUTPUT
└─ Result: 8
📊 [RESPONSE.CREATED] ▸
📊 [RESPONSE.IN_PROGRESS] ▸
📊 [RESPONSE.OUTPUT_ITEM.ADDED] ▸
📊 [RESPONSE.REASONING_SUMMARY_PART.ADDED] ▸
📊 [RESPONSE.REASONING_SUMMARY_TEXT.DELTA] ▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸
📊 [RESPONSE.REASONING_SUMMARY_TEXT.DONE] ▸
📊 [RESPONSE.REASONING_SUMMARY_PART.DONE] ▸
📊 [RESPONSE.REASONING_SUMMARY_PART.ADDED] ▸
📊 [RESPONSE.REASONING_SUMMARY_TEXT.DELTA] ▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸
📊 [RESPONSE.REASONING_SUMMARY_TEXT.DONE] ▸
📊 [RESPONSE.REASONING_SUMMARY_PART.DONE] ▸
✅ [RESPONSE.OUTPUT_ITEM.DONE] (reasoning) ▸
📊 [RESPONSE.OUTPUT_ITEM.ADDED] ▸
📊 [RESPONSE.CONTENT_PART.ADDED] ▸
📊 [RESPONSE.OUTPUT_TEXT.DELTA] ▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸▸
📊 [RESPONSE.OUTPUT_TEXT.DONE] ▸
📊 [RESPONSE.CONTENT_PART.DONE] ▸
✅ [RESPONSE.OUTPUT_ITEM.DONE] (message) ▸
📊 [RESPONSE.COMPLETED] ▸
⚡ EVENT: reasoning_item
└─ agent: Agent(name='Joker', handoff_description=None, tools=[FunctionTool(name='how_many_jokes', description='Return a random integer of jokes to tell between 1 and 10 (inclusive).', params_json_schema={'properties': {}, 'title': 'how_many_jokes_args', 'type': 'object', 'additionalProperties': False, 'required': []}, on_invoke_tool=<function function_tool.<locals>._create_function_tool.<locals>._on_invoke_tool at 0x10f7b2f20>, strict_json_schema=True, is_enabled=True)], mcp_servers=[], mcp_config={}, instructions='First call the `how_many_jokes` tool, then tell that many jokes.', prompt=None, handoffs=[], model='gpt-5', model_settings=ModelSettings(temperature=None, top_p=None, frequency_penalty=None, presence_penalty=None, tool_choice=None, parallel_tool_calls=None, truncation=None, max_tokens=None, reasoning=Reasoning(effort='high', generate_summary=None, summary='auto'), verbosity=None, metadata=None, store=None, include_usage=None, response_include=None, top_logprobs=None, extra_query=None, extra_body=None, extra_headers=None, extra_args=None), input_guardrails=[], output_guardrails=[], output_type=None, hooks=None, tool_use_behavior='run_llm_again', reset_tool_choice=True)
└─ raw_item: ResponseReasoningItem(id='rs_68cb60d854f08190a20f5645cb2533d803d11b9d3562e6b1', summary=[Summary(text="**Crafting family-friendly jokes**\n\nI’m thinking about how to create 8 clean, short, family-friendly jokes. The interface suggests avoiding heavy formatting, so I’ll consider using bullet points or numbers to keep it organized. I want to make sure there's a good variety, including puns and classic dad jokes, while keeping everything safe and avoiding any offense. Here are some candidate jokes I came up with to fit those guidelines. Let’s get creative!", type='summary_text'), Summary(text='**Finalizing family-friendly jokes**\n\nI’ve created the last two jokes to finish my list of 8. The seventh joke is, “Parallel lines have so much in common… it’s a shame they’ll never meet.” For the eighth, I decided to go with, “I ordered a chicken and an egg from Amazon. I’ll let you know which comes first.” Now I’m ready to present these jokes simply, using a numbered list without any heavy formatting. I just need to keep things friendly and straightforward!', type='summary_text')], type='reasoning', content=None, encrypted_content=None, status=None)
└─ type: reasoning_item
📝 MESSAGE OUTPUT
─────────────────────────────────────
1) I told my computer I needed a break—now it won’t stop sending me KitKat ads.
2) Why did the scarecrow win an award? He was outstanding in his field.
3) I used to play piano by ear, but now I use my hands.
4) Why don’t skeletons fight each other? They don’t have the guts.
5) I’m reading a book about anti-gravity—it's impossible to put down.
6) Why did the math book look sad? It had too many problems.
7) Parallel lines have so much in common… it’s a shame they’ll never meet.
8) I ordered a chicken and an egg from Amazon. I’ll let you know which comes first.
─────────────────────────────────────
============================================================
✨ AGENT STREAM COMPLETE
============================================================
Expected behavior
The ReasoningItem should be emitted as soon as the RESPONSE.OUTPUT_ITEM.DONE
is complete. Instead, it gets emitted sometime between the TOOL CALL INITIATED
and TOOL CALL OUTPUT
.
This causes many problems:
- any UI that depends on these events will show the wrong thing
- if you replay the events in the API you'll get errors because reasoning items can't be between tool calls
nicholas-ramsey, widike and Nramsey65ihower