Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds twilio dtmf action #623

Merged
merged 9 commits into from
Jul 9, 2024
Merged

adds twilio dtmf action #623

merged 9 commits into from
Jul 9, 2024

Conversation

ajar98
Copy link
Contributor

@ajar98 ajar98 commented Jul 9, 2024

uses numpy to generate DTMF tones that can be used in conversations

  • creates DTMFToneGenerator singleton to cache DTMF tones
  • adds tests for the new action / singleton
  • on Twilio mark messages - on purpose doesn't send a mark message into the twilio playback queue, so that it doesn't interrupt the normal flow of real synthesis messages going through
    • so, in the case of a clear message, it will get interrupted as well

tested it out against a real IVR, and it was able to press buttons!

@ajar98 ajar98 changed the title [WIP] adds twilio dtmf adds twilio dtmf action Jul 9, 2024
@ajar98 ajar98 marked this pull request as ready for review July 9, 2024 02:03
@ajar98 ajar98 requested a review from adnaans July 9, 2024 02:04
output = audioop.lin2ulaw(pcm, 2)
else:
output = pcm
self.tone_cache[(keypad_entry, sampling_rate, audio_encoding)] = output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the tone cache necessary? Like this cache is only useful if the agent decides to press multiple tones in a single turn and there are repeated numbers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a singleton, so it would get cached for the course of a program (e.g. for the time for a FastAPI server to remain up)

"streamSid": self.stream_sid,
"media": {"payload": base64.b64encode(dtmf_tone).decode("utf-8")},
}
self._twilio_events_queue.put_nowait(json.dumps(dtmf_message))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these get interrupted? And if so is a mark necessary as well, so using _send_audio_chunk_and_mark()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these cannot get interrupted (and thus mirrors the vonage implementation). if we send an event to _twilio_events_queue we can expect that it gets sent to the ws unless the output device gets torn down (when the conversation ends)

logger.warning(f"Invalid DTMF buttons: {buttons}")
return ActionOutput(
action_type=action_input.action_config.type,
response=DTMFResponse(success=False),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we tested these outputs are properly sent to the LLM? When I look at action_result_to_string() for DTMF it only has a successful message. We should update it to say "invalid dtmf buttons" in this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated + added test

)

mock_twilio_output_device.start()
while not mock_twilio_output_device._twilio_events_queue.empty():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any situation where the queue could remain empty and this causes an infinite loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, updated the while loop

keypad_entry: KeypadEntry,
sampling_rate: int,
audio_encoding: AudioEncoding,
duration_seconds: float = DEFAULT_DTMF_TONE_LENGTH_SECONDS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any use of duration_seconds. Is it really necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary to make this work on Twilio per se - i'd rather keep it in the case it ever becomes useful. theoretically the lower you go, the faster the action can run

@ajar98 ajar98 requested a review from adnaans July 9, 2024 22:11
@ajar98 ajar98 merged commit 48cba3e into main Jul 9, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants