Skip to content

Realtime: enable a playback tracker #1242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 29, 2025
Merged

Realtime: enable a playback tracker #1242

merged 1 commit into from
Jul 29, 2025

Conversation

rm-openai
Copy link
Collaborator

@rm-openai rm-openai commented Jul 25, 2025

So far, we've been assuming that audio is played:

  • immediately (i.e. with 0 delay/latency)
  • at realtime

This causes issues with our interrupt tracking. The model wants to know how much audio the user has actually heard. For example in a phone call agent, this wouldn't work (bc theres a delay of a few hundred ms between model sending audio and the user hearing it). This PR allows you to pass a playback tracker.


@rm-openai rm-openai changed the base branch from main to rm/pr1235 July 25, 2025 01:30
rm-openai added a commit that referenced this pull request Jul 25, 2025
Will need this for a followup.

---
[//]: # (BEGIN SAPLING FOOTER)
* #1243
* #1242
* __->__ #1235
Base automatically changed from rm/pr1235 to main July 25, 2025 01:30
@rm-openai rm-openai requested a review from seratch July 25, 2025 01:31
@seratch
Copy link
Member

seratch commented Jul 25, 2025

it seems the file conflicts with main branch need to be resolved

Copy link
Member

@seratch seratch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor suggestion of naming local variables; they are local variables so it's okay to use the same name though

Comment on lines 26 to 35
def on_audio_delta(self, item_id: str, item_content_index: int, bytes: bytes) -> None:
"""Called when an audio delta is received from the model."""
ms = calculate_audio_length_ms(self._format, bytes)
new_key = (item_id, item_content_index)

self._last_audio_item = new_key
if new_key not in self._states:
self._states[new_key] = ModelAudioState(datetime.now(), ms)
else:
self._states[new_key].audio_length_ms += ms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In general, using built-in/reserved names like bytes for variables should be avoided. If audio does not sound great, data, delta, audio_data etc. should be fine too.

Suggested change
def on_audio_delta(self, item_id: str, item_content_index: int, bytes: bytes) -> None:
"""Called when an audio delta is received from the model."""
ms = calculate_audio_length_ms(self._format, bytes)
new_key = (item_id, item_content_index)
self._last_audio_item = new_key
if new_key not in self._states:
self._states[new_key] = ModelAudioState(datetime.now(), ms)
else:
self._states[new_key].audio_length_ms += ms
def on_audio_delta(self, item_id: str, item_content_index: int, audio: bytes) -> None:
"""Called when an audio delta is received from the model."""
ms = calculate_audio_length_ms(self._format, audio)
new_key = (item_id, item_content_index)
self._last_audio_item = new_key
if new_key not in self._states:
self._states[new_key] = ModelAudioState(datetime.now(), ms)
else:
self._states[new_key].audio_length_ms += ms

Comment on lines 6 to 9
def calculate_audio_length_ms(format: RealtimeAudioFormat | None, bytes: bytes) -> float:
if format and format.startswith("g711"):
return (len(bytes) / 8000) * 1000
return (len(bytes) / 24 / 2) * 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Suggested change
def calculate_audio_length_ms(format: RealtimeAudioFormat | None, bytes: bytes) -> float:
if format and format.startswith("g711"):
return (len(bytes) / 8000) * 1000
return (len(bytes) / 24 / 2) * 1000
def calculate_audio_length_ms(format: RealtimeAudioFormat | None, audio: bytes) -> float:
if format and format.startswith("g711"):
return (len(audio) / 8000) * 1000
return (len(audio) / 24 / 2) * 1000

@rm-openai rm-openai merged commit b459cc4 into main Jul 29, 2025
10 checks passed
@rm-openai rm-openai deleted the rm/pr1242 branch July 29, 2025 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants