Skip to content

Conversation

@disolaterX
Copy link

@disolaterX disolaterX commented Oct 3, 2025

Description

This PR enhances the ElevenLabs TTS service to support additional audio output formats and provides users with explicit control over the output format.

Changes

Added Output Formats

Expanded ElevenLabsOutputFormat to support:

  • pcm_8000 - 8kHz PCM
  • ulaw_8000 - 8kHz μ-law encoding
  • alaw_8000 - 8kHz A-law encoding
  • opus_48000_32 - 48kHz Opus at 32kbps
  • opus_48000_64 - 48kHz Opus at 64kbps
  • opus_48000_96 - 48kHz Opus at 96kbps
  • opus_48000_128 - 48kHz Opus at 128kbps
  • opus_48000_192 - 48kHz Opus at 192kbps

Enhanced Format Selection

  • Added output_format parameter to both ElevenLabsTTSService and ElevenLabsHttpTTSService
  • When output_format is explicitly provided, it takes precedence over the automatically determined format from sample_rate
  • Updated output_format_from_sample_rate() to handle 48kHz sample rate (defaults to opus_48000_128)

Usage Example

# Explicitly specify output format (takes precedence)
service = ElevenLabsTTSService(
    api_key="...",
    voice_id="...",
    output_format="opus_48000_96"  # Will use this format
)

# Or let it auto-determine from sample_rate
service = ElevenLabsTTSService(
    api_key="...",
    voice_id="...",
    sample_rate=48000  # Will auto-select opus_48000_128
)
# Explicitly specify output format (takes precedence)
service = ElevenLabsTTSService(
    api_key="...",
    voice_id="...",
    output_format="alaw_8000"  # Will use this format
)

# Or let it auto-determine from sample_rate
service = ElevenLabsTTSService(
    api_key="...",
    voice_id="...",
    sample_rate=8000  # Will auto-select pcm_8000
)

Breaking Changes

None - this is a backward-compatible enhancement.

@markbackman
Copy link
Contributor

TTS services need to be configurable via sample rate. The best practice is to specify the sample rate in the PipelineParams for input or output. This ensures that all corresponding service use the same sample rate. So, the output_format_from_sample_rate() method is required. Also, I don't see a need for mp3 audio. Pipecat handles pcm audio.

I'd suggest reworking this PR with that in mind.

@disolaterX
Copy link
Author

TTS services need to be configurable via sample rate. The best practice is to specify the sample rate in the PipelineParams for input or output. This ensures that all corresponding service use the same sample rate. So, the output_format_from_sample_rate() method is required. Also, I don't see a need for mp3 audio. Pipecat handles pcm audio.

I'd suggest reworking this PR with that in mind.

Hey, the goal was honestly to support telephony codecs. For example, an 8000 sample rate is available for PCM, ALAW, and ULAW. However, in that case, the current function fails to provide the correct information. What do you suggest we do in that case?

And about MP3, I understand your view and will be removing that.

@markbackman
Copy link
Contributor

I've thought through this more and I'm not sure there is an actual benefit. If I may ask, what are you looking to accomplish in supporting the additional output formats?

Pipecat already handles conversions, which have a negligible performance cost. I think the added API complexity might be more of a net negative than the performance boost of adding support for the other formats.

@disolaterX
Copy link
Author

#2784 (comment)

What are you looking to accomplish in supporting the additional output formats?
Goal is support developer that are using Pipecat to connect to PSTN telephone layer ie Asterisk or any other ePBXs at a scale and don't want to take the cost of transcoding.

I suggest then we can default to auto finding the codec. If specified is mentioned, then take over; we don't run output_format_from_sample_rate

what you say ?

@disolaterX disolaterX changed the title feat(elevenlabs): add new output formats and deprecate sample_rate-based format selection feat(elevenlabs): add new output formats and support output_format Oct 10, 2025
@markbackman
Copy link
Contributor

markbackman commented Oct 10, 2025

What I'm saying is that I don't see the utility to using a different encodings. What use case do you have in mind and what benefit does it provide?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants