-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat(openai): Add OpenAI Transcriptions support with comprehensive testing #8361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(openai): Add OpenAI Transcriptions support with comprehensive testing #8361
Conversation
…sting Add complete OpenAI Transcriptions functionality including: - **Core transcription class** (`OpenAITranscriptions`): - Support for Whisper and GPT transcription models - Multiple audio format detection (MP3, WAV, FLAC, OGG, AAC, MP4) - Automatic filename inference from audio signatures - ID3 tag stripping for MP3 files - Model-specific response format constraints - Request-level option overrides - **Audio format detection**: - Automatic MIME type detection from byte signatures - Support for Buffer, File, Uint8Array, and Blob inputs - Robust error handling for unknown formats - **Type safety**: - Smart TypeScript response types based on model and format - Model-specific configuration constraints - Comprehensive type definitions for all supported formats - **Comprehensive test suite**: - Integration tests for format detection and error handling - Input validation and options testing - Model-specific behavior verification - **Package integration**: - Export transcription functionality from main package - Updated build configuration for new module structure The implementation follows LangChain patterns with proper serialization, secret management, and async calling support. Includes detailed JSDoc documentation with practical usage examples.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Skipped Deployment
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @christian-bromann!
Supporting transcription in this way is evergreen I'm thinking and it might need to be more involved/ an evaluation to see if it's worth it to us. Part of the value sell with langchain is optionality between models, so where we don't have a common abstraction for speech-to-text (like we do with chat models) we'd need to see if this is something we want to support long term. It would be one thing if we could surface traces from this class, but we typically do that in the lower abstraction layers (like where ChatOpenAI extends a base class). The "langchain specific" parameters that you have here (lc_secrets
, lc_aliases
) are specifically for tracing, but those values get utilized further in the inheritance chain so here they are kind of acting as filler.
I wonder if an alternate form factor for this would be if we surfaced whisper as structured tools instead of its own model class, but even then it may make more sense to offshore that to a community package rather than bringing it under the umbrella of this repo (up for debate I imagine).
As for vitest -- this is something we want to use in langchain (we already use it in langgraph!)
This PR adds complete OpenAI Transcriptions functionality to LangChain.js, providing full support for OpenAI's Whisper and GPT transcription models with advanced audio processing capabilities.
🎤 Key Features
Core transcription class (
OpenAITranscriptions
):whisper-1
) and GPT transcription models (gpt-4o-mini-transcribe
,gpt-4o-transcribe
)Audio format detection:
Type safety:
assertType
to ensure compile-time type safetyComprehensive test suite:
Package integration:
🔧 Testing Infrastructure Changes
Migrated from Jest to Vitest to properly mock the OpenAI library and support advanced type testing:
.test-d.ts
files using Vitest'sassertType
utility for compile-time type checking📚 Implementation Details
The implementation follows LangChain patterns with proper serialization, secret management, and async calling support. Includes detailed JSDoc documentation with practical usage examples.
Example Usage:
The type system ensures that incompatible combinations (like using
verbose_json
with GPT models) are caught at compile time, preventing runtime errors and improving developer experience.