-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Open
Labels
Description
Summary
When RunResultStreaming raises an error (API errors, connection drops, context window exceeded, etc.), result.context_wrapper.usage stays at 0 tokens even though tokens were consumed.
Root Cause
Usage is only accumulated when ResponseCompletedEvent arrives:
# src/agents/run.py, line 1245
async for event in model.stream_response(...):
if isinstance(event, ResponseCompletedEvent):
usage = Usage(...)
context_wrapper.usage.add(usage) # line 1263If the model provider raises an exception before yielding ResponseCompletedEvent (which happens for context window errors, mid-stream connection failures, rate limits, etc.), the loop exits without ever updating usage.
Fix Ideas
- Estimate input tokens on error: When streaming fails, estimate input tokens from the request we sent (we have
filtered.input+filtered.instructions). Mark it as estimated via a flag. Output tokens are lost but at least we track what went in.
I'd love to hear about whether this fix idea is valid and welcome before going ahead with implementing it. If you can think of any workarounds temporarily, that would also be great.