Skip to content

Conversation

@Angad-2002
Copy link

Implement detailed usage metrics for Speech-to-Text (STT) services across all providers. This enables monitoring of performance, quality, and cost metrics for transcription operations.

Changes:

  • Add STTUsage and STTUsageMetricsData classes to core metrics module
  • Implement metrics collection in base STTService class
  • Add metrics support to all STT service providers:
    • AssemblyAI, AWS, Azure, Cartesia
    • Deepgram (standard and Flux)
    • ElevenLabs, Fal, Gladia, Google
    • Riva, Soniox, Speechmatics, Whisper
  • Extend FrameProcessor with metrics reporting capabilities
  • Update RTVI framework to support STT metrics
  • Add frame processor metrics module for centralized metric handling

Metrics tracked:

  • Content: word count, character count
  • Performance: processing time, real-time factor (RTF), words per second, time to first transcript (TTFT), time to final transcript
  • Quality: average confidence, word error rate (WER), proper noun accuracy
  • Audio metadata: sample rate, channels, encoding
  • Cost: cost per word, estimated total cost

This implementation provides comprehensive observability for STT operations, enabling performance optimization and cost tracking across all supported providers.

Implement detailed usage metrics for Speech-to-Text (STT) services across
all providers. This enables monitoring of performance, quality, and cost
metrics for transcription operations.

Changes:
- Add STTUsage and STTUsageMetricsData classes to core metrics module
- Implement metrics collection in base STTService class
- Add metrics support to all STT service providers:
  * AssemblyAI, AWS, Azure, Cartesia
  * Deepgram (standard and Flux)
  * ElevenLabs, Fal, Gladia, Google
  * Riva, Soniox, Speechmatics, Whisper
- Extend FrameProcessor with metrics reporting capabilities
- Update RTVI framework to support STT metrics
- Add frame processor metrics module for centralized metric handling

Metrics tracked:
- Content: word count, character count
- Performance: processing time, real-time factor (RTF), words per second,
  time to first transcript (TTFT), time to final transcript
- Quality: average confidence, word error rate (WER), proper noun accuracy
- Audio metadata: sample rate, channels, encoding
- Cost: cost per word, estimated total cost

This implementation provides comprehensive observability for STT operations,
enabling performance optimization and cost tracking across all supported
providers.
@Angad-2002
Copy link
Author

This PR is in response to issue #1933

@Angad-2002
Copy link
Author

@markbackman may you look into this PR?

@ophir11235813
Copy link

@markbackman Hi, do you have a sense when this will be released? Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants