feat: Add comprehensive STT usage metrics tracking #2843

Angad-2002 · 2025-10-13T16:22:47Z

Implement detailed usage metrics for Speech-to-Text (STT) services across all providers. This enables monitoring of performance, quality, and cost metrics for transcription operations.

Changes:

Add STTUsage and STTUsageMetricsData classes to core metrics module
Implement metrics collection in base STTService class
Add metrics support to all STT service providers:
- AssemblyAI, AWS, Azure, Cartesia
- Deepgram (standard and Flux)
- ElevenLabs, Fal, Gladia, Google
- Riva, Soniox, Speechmatics, Whisper
Extend FrameProcessor with metrics reporting capabilities
Update RTVI framework to support STT metrics
Add frame processor metrics module for centralized metric handling

Metrics tracked:

Content: word count, character count
Performance: processing time, real-time factor (RTF), words per second, time to first transcript (TTFT), time to final transcript
Quality: average confidence, word error rate (WER), proper noun accuracy
Audio metadata: sample rate, channels, encoding
Cost: cost per word, estimated total cost

This implementation provides comprehensive observability for STT operations, enabling performance optimization and cost tracking across all supported providers.

Implement detailed usage metrics for Speech-to-Text (STT) services across all providers. This enables monitoring of performance, quality, and cost metrics for transcription operations. Changes: - Add STTUsage and STTUsageMetricsData classes to core metrics module - Implement metrics collection in base STTService class - Add metrics support to all STT service providers: * AssemblyAI, AWS, Azure, Cartesia * Deepgram (standard and Flux) * ElevenLabs, Fal, Gladia, Google * Riva, Soniox, Speechmatics, Whisper - Extend FrameProcessor with metrics reporting capabilities - Update RTVI framework to support STT metrics - Add frame processor metrics module for centralized metric handling Metrics tracked: - Content: word count, character count - Performance: processing time, real-time factor (RTF), words per second, time to first transcript (TTFT), time to final transcript - Quality: average confidence, word error rate (WER), proper noun accuracy - Audio metadata: sample rate, channels, encoding - Cost: cost per word, estimated total cost This implementation provides comprehensive observability for STT operations, enabling performance optimization and cost tracking across all supported providers.

Angad-2002 · 2025-10-14T06:11:14Z

This PR is in response to issue #1933

Angad-2002 · 2025-10-15T05:17:10Z

@markbackman may you look into this PR?

ophir11235813 · 2025-10-28T01:48:24Z

@markbackman Hi, do you have a sense when this will be released? Many thanks!

Merge branch 'pipecat-ai:main' into feature/stt-usage-metrics

86e2dc6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add comprehensive STT usage metrics tracking #2843

feat: Add comprehensive STT usage metrics tracking #2843

Uh oh!

Angad-2002 commented Oct 13, 2025

Uh oh!

Angad-2002 commented Oct 14, 2025

Uh oh!

Angad-2002 commented Oct 15, 2025

Uh oh!

ophir11235813 commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add comprehensive STT usage metrics tracking #2843

Are you sure you want to change the base?

feat: Add comprehensive STT usage metrics tracking #2843

Uh oh!

Conversation

Angad-2002 commented Oct 13, 2025

Uh oh!

Angad-2002 commented Oct 14, 2025

Uh oh!

Angad-2002 commented Oct 15, 2025

Uh oh!

ophir11235813 commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants