feat(openai): Add OpenAI Transcriptions support with comprehensive testing #8361

christian-bromann · 2025-06-13T18:54:08Z

This PR adds complete OpenAI Transcriptions functionality to LangChain.js, providing full support for OpenAI's Whisper and GPT transcription models with advanced audio processing capabilities.

🎤 Key Features

Core transcription class (`OpenAITranscriptions`):

Support for Whisper (whisper-1) and GPT transcription models (gpt-4o-mini-transcribe, gpt-4o-transcribe)
Multiple audio format detection (MP3, WAV, FLAC, OGG, AAC, MP4, WEBM)
Automatic filename inference from audio signatures
ID3 tag stripping for MP3 files to ensure proper processing
Model-specific response format constraints with TypeScript enforcement
Request-level option overrides for fine-grained control

Audio format detection:

Automatic MIME type detection from byte signatures
Support for Buffer, File, Uint8Array, and Blob inputs
Robust error handling for unknown formats
Smart filename inference when not provided

Type safety:

Smart TypeScript response types based on model and format selection
Model-specific configuration constraints (e.g., GPT models only support "text" and "json" formats)
Comprehensive type definitions for all supported formats
Critical type tests using Vitest's assertType to ensure compile-time type safety

Comprehensive test suite:

Integration tests for format detection and error handling
Input validation and options testing
Model-specific behavior verification
Type-level tests to validate TypeScript constraints

Package integration:

Export transcription functionality from main package
Updated build configuration for new module structure
Complete documentation with practical usage examples

🔧 Testing Infrastructure Changes

Migrated from Jest to Vitest to properly mock the OpenAI library and support advanced type testing:

Enhanced mocking capabilities: Vitest provides superior mocking for the OpenAI SDK, allowing proper isolation of external dependencies during testing
Type testing support: Added .test-d.ts files using Vitest's assertType utility for compile-time type checking
Critical for this model: The transcription functionality relies heavily on TypeScript's type system to enforce model-specific constraints (e.g., response format limitations), making type tests essential for ensuring correctness
Better async handling: Vitest's modern async/await support provides more reliable testing for the audio processing pipeline

📚 Implementation Details

The implementation follows LangChain patterns with proper serialization, secret management, and async calling support. Includes detailed JSDoc documentation with practical usage examples.

Example Usage:

import { OpenAITranscriptions } from "@langchain/openai";

// Basic transcription
const transcriber = new OpenAITranscriptions({
  model: "whisper-1",
  response_format: "verbose_json"
});

const result = await transcriber.transcribe({
  audio: audioBuffer,
  options: {
    language: "en",
    temperature: 0.2,
    timestamp_granularities: ["word", "segment"]
  }
});

// TypeScript knows the exact response type
console.log(result.text, result.words, result.segments);

The type system ensures that incompatible combinations (like using verbose_json with GPT models) are caught at compile time, preventing runtime errors and improving developer experience.

…sting Add complete OpenAI Transcriptions functionality including: - **Core transcription class** (`OpenAITranscriptions`): - Support for Whisper and GPT transcription models - Multiple audio format detection (MP3, WAV, FLAC, OGG, AAC, MP4) - Automatic filename inference from audio signatures - ID3 tag stripping for MP3 files - Model-specific response format constraints - Request-level option overrides - **Audio format detection**: - Automatic MIME type detection from byte signatures - Support for Buffer, File, Uint8Array, and Blob inputs - Robust error handling for unknown formats - **Type safety**: - Smart TypeScript response types based on model and format - Model-specific configuration constraints - Comprehensive type definitions for all supported formats - **Comprehensive test suite**: - Integration tests for format detection and error handling - Input validation and options testing - Model-specific behavior verification - **Package integration**: - Export transcription functionality from main package - Updated build configuration for new module structure The implementation follows LangChain patterns with proper serialization, secret management, and async calling support. Includes detailed JSDoc documentation with practical usage examples.

vercel · 2025-06-13T18:54:12Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
langchainjs-docs	❌ Failed (Inspect)			Jun 13, 2025 6:55pm

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchainjs-api-refs	⬜️ Ignored (Inspect)			Jun 13, 2025 6:55pm

hntrl

Thanks for this @christian-bromann!

Supporting transcription in this way is evergreen I'm thinking and it might need to be more involved/ an evaluation to see if it's worth it to us. Part of the value sell with langchain is optionality between models, so where we don't have a common abstraction for speech-to-text (like we do with chat models) we'd need to see if this is something we want to support long term. It would be one thing if we could surface traces from this class, but we typically do that in the lower abstraction layers (like where ChatOpenAI extends a base class). The "langchain specific" parameters that you have here (lc_secrets, lc_aliases) are specifically for tracing, but those values get utilized further in the inheritance chain so here they are kind of acting as filler.

I wonder if an alternate form factor for this would be if we surfaced whisper as structured tools instead of its own model class, but even then it may make more sense to offshore that to a community package rather than bringing it under the umbrella of this repo (up for debate I imagine).

As for vitest -- this is something we want to use in langchain (we already use it in langgraph!)

christian-bromann added 4 commits June 9, 2025 21:41

add tests

3a27c20

update example

35d977e

fix typing

81ee61c

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 13, 2025

vercel bot had a problem deploying to Preview – langchainjs-docs June 13, 2025 18:54 Failure

dosubot bot added the auto:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features label Jun 13, 2025

remove load files

5498bb5

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 13, 2025

vercel bot had a problem deploying to Preview – langchainjs-docs June 13, 2025 18:55 Failure

hntrl reviewed Jun 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(openai): Add OpenAI Transcriptions support with comprehensive testing #8361

feat(openai): Add OpenAI Transcriptions support with comprehensive testing #8361

christian-bromann commented Jun 13, 2025

Uh oh!

vercel bot commented Jun 13, 2025 •

edited

Loading

Uh oh!

hntrl left a comment

Uh oh!

Uh oh!

feat(openai): Add OpenAI Transcriptions support with comprehensive testing #8361

Are you sure you want to change the base?

feat(openai): Add OpenAI Transcriptions support with comprehensive testing #8361

Conversation

christian-bromann commented Jun 13, 2025

🎤 Key Features

Core transcription class (OpenAITranscriptions):

Audio format detection:

Type safety:

Comprehensive test suite:

Package integration:

🔧 Testing Infrastructure Changes

📚 Implementation Details

Example Usage:

Uh oh!

vercel bot commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hntrl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Core transcription class (`OpenAITranscriptions`):

vercel bot commented Jun 13, 2025 •

edited

Loading