-
-
Notifications
You must be signed in to change notification settings - Fork 96
feat(audio): replace audio-server with Vercel Blobs and OpenAI Whisper #368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(audio): replace audio-server with Vercel Blobs and OpenAI Whisper #368
Conversation
…r - Implement audio chunking for files over 20 minutes - Add Vercel Blob storage - Update plugin transcription endpoint - Remove audio-server dependency - Add parallel processing - Update to Clerk auth - Add progress indicators
@onyedikachi-david is attempting to deploy a commit to the Prologe Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
Implemented a new audio transcription system using Vercel Blobs for storage and OpenAI Whisper for processing, replacing the dedicated audio-server package with a more streamlined solution.
- Added
splitAudioIntoChunks
function in/packages/web/app/api/(new-ai)/transcribe/route.ts
to handle files over 24 minutes using ffmpeg - Implemented parallel processing with rate limiting (1s delay between chunks) to prevent OpenAI API throttling
- Added Vercel Blob storage with 1-hour cache for audio files using
put
function - Added streaming response handling to return transcription results in real-time
- Updated authentication from Unkey to Clerk with proper session validation
💡 (2/5) Greptile learns from your feedback when you react with 👍/👎!
3 file(s) reviewed, 3 comment(s)
Edit PR Review Bot Settings | Greptile
…improve audio chunk handling Signed-off-by: David Anyatonwu <[email protected]>
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
export const maxDuration = 7200; // 120 minutes for long transcriptions
Doesn't work. Please test on vercel + show proof that it works with video. I keep getting request entity to large. This flow only works if users do direct uploads to vercel blobs |
Okay |
…d URL and implementing direct uploads with FormData Signed-off-by: David Anyatonwu <[email protected]>
note-companion.mp4Hello, @benjaminshafii Here is a demo, demoing on Obsidian was probelematic (Do I need a licence key for that?) as i explained in the demo. Had to create a page to test it locally and it works, I may assume it will work also in the built plugin. Below is the page used to test. I also couldn't test with 120 minutes (I tested with 88+ minutes) audio due to the OpenAI cost model cost 🥲🥲 page.tsx
'use client';
import { useState, useRef } from 'react';
const MAX_FILE_SIZE_MB = 50;
const UPLOAD_TIMEOUT_MS = 5 * 60 * 1000; // 5 minutes
const CHUNK_SIZE = 20 * 60; // 20 minutes in seconds (matching server config)
interface ChunkStatus {
index: number;
status: 'pending' | 'processing' | 'completed' | 'error';
text?: string;
error?: string;
}
export default function TestTranscription() {
const [file, setFile] = useState<File | null>(null);
const [transcription, setTranscription] = useState<string>('');
const [isTranscribing, setIsTranscribing] = useState<boolean>(false);
const [error, setError] = useState<string | null>(null);
const [uploadedUrl, setUploadedUrl] = useState<string | null>(null);
const [audioDuration, setAudioDuration] = useState<number | null>(null);
const [uploadProgress, setUploadProgress] = useState<number>(0);
const [currentStep, setCurrentStep] = useState<string>('');
const [chunks, setChunks] = useState<ChunkStatus[]>([]);
const fileInputRef = useRef<HTMLInputElement>(null);
const abortControllerRef = useRef<AbortController | null>(null);
const getAudioDuration = (file: File): Promise<number> => {
return new Promise((resolve, reject) => {
const audio = new Audio();
const reader = new FileReader();
reader.onload = (e) => {
if (e.target?.result) {
audio.src = e.target.result as string;
audio.onloadedmetadata = () => {
resolve(audio.duration);
};
audio.onerror = () => reject(new Error('Failed to load audio file'));
}
};
reader.onerror = () => reject(new Error('Failed to read file'));
reader.readAsDataURL(file);
});
};
const handleFileChange = async (e: React.ChangeEvent<HTMLInputElement>) => {
if (e.target.files && e.target.files.length > 0) {
const selectedFile = e.target.files[0];
const fileSizeMB = selectedFile.size / (1024 * 1024);
if (fileSizeMB > MAX_FILE_SIZE_MB) {
setError(`File size (${fileSizeMB.toFixed(2)}MB) exceeds the maximum limit of ${MAX_FILE_SIZE_MB}MB`);
return;
}
try {
const duration = await getAudioDuration(selectedFile);
setAudioDuration(duration);
// Calculate expected chunks
const numChunks = Math.ceil(duration / CHUNK_SIZE);
const initialChunks: ChunkStatus[] = Array.from({ length: numChunks }, (_, i) => ({
index: i,
status: 'pending',
}));
setChunks(initialChunks);
setFile(selectedFile);
setError(null);
setCurrentStep('File selected and validated');
} catch (err) {
setError('Failed to read audio file duration. Please ensure it\'s a valid audio file.');
setFile(null);
setAudioDuration(null);
setChunks([]);
}
}
};
const handleTranscribe = async () => {
if (!file) {
setError('Please select an audio file');
return;
}
setIsTranscribing(true);
setError(null);
setTranscription('');
setUploadProgress(0);
// Create new AbortController for this operation
abortControllerRef.current = new AbortController();
const { signal } = abortControllerRef.current;
try {
// Step 1: Upload the file
setCurrentStep('Uploading file...');
const formData = new FormData();
formData.append('file', file);
const uploadTimeout = setTimeout(() => {
if (abortControllerRef.current) {
abortControllerRef.current.abort();
}
}, UPLOAD_TIMEOUT_MS);
const uploadResponse = await fetch('/api/transcribe/upload', {
method: 'POST',
body: formData,
signal,
});
clearTimeout(uploadTimeout);
if (!uploadResponse.ok) {
const errorData = await uploadResponse.json();
throw new Error(`Failed to upload file: ${errorData.error || uploadResponse.statusText}`);
}
const { url } = await uploadResponse.json();
setUploadedUrl(url);
const extension = file.name.split('.').pop()?.toLowerCase() || '';
// Update progress information
setCurrentStep('File uploaded successfully. Starting transcription...');
setTranscription('Processing audio file...\n');
setTranscription(prev => prev + `File format: ${extension}\n`);
setTranscription(prev => prev + `File size: ${(file.size / (1024 * 1024)).toFixed(2)} MB\n`);
if (audioDuration) {
setTranscription(prev => prev + `Duration: ${Math.round(audioDuration)} seconds (${(audioDuration / 60).toFixed(2)} minutes)\n`);
setTranscription(prev => prev + `Number of chunks: ${chunks.length}\n`);
}
setTranscription(prev => prev + '\nTranscribing...\n\n');
// Step 2: Transcribe
setCurrentStep('Transcribing audio...');
const transcribeResponse = await fetch('/api/transcribe', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
blobUrl: url,
extension,
}),
signal,
});
if (!transcribeResponse.ok) {
const errorText = await transcribeResponse.text();
try {
const errorJson = JSON.parse(errorText);
throw new Error(`Failed to transcribe audio: ${errorJson.error || transcribeResponse.statusText}`);
} catch (e) {
throw new Error(`Failed to transcribe audio: ${errorText || transcribeResponse.statusText}`);
}
}
// Step 3: Stream response
setCurrentStep('Receiving transcription...');
const reader = transcribeResponse.body?.getReader();
if (!reader) {
throw new Error('No response body');
}
let result = '';
let reading = true;
// Initialize first chunk as processing
if (chunks.length > 0) {
setChunks(prev => prev.map((chunk, i) => ({
...chunk,
status: i === 0 ? 'processing' : 'pending'
})));
}
try {
while (reading) {
const { done, value } = await reader.read();
if (done) {
// When done, mark the chunk as completed with the full text
if (chunks.length > 0) {
setChunks(prev => prev.map((chunk, index) => ({
...chunk,
status: 'completed',
text: result.trim()
})));
}
reading = false;
continue;
}
const chunkText = new TextDecoder().decode(value);
result += chunkText;
// Update transcription immediately
setTranscription(prev => {
const lines = prev.split('\n');
return [...lines.slice(0, 6), result].join('\n');
});
}
} catch (streamError) {
console.error('Error processing stream:', streamError);
// Mark all chunks as error
setChunks(prev => prev.map(chunk => ({
...chunk,
status: 'error',
error: streamError instanceof Error ? streamError.message : 'Stream processing failed'
})));
throw streamError;
}
setCurrentStep('Transcription completed');
} catch (err) {
if (err instanceof Error && err.name === 'AbortError') {
setError('Operation timed out. Please try with a smaller file or check your connection.');
} else {
console.error('Error in transcription:', err);
setError(err instanceof Error ? err.message : 'An unknown error occurred');
}
// Mark remaining chunks as error
setChunks(prev => prev.map(chunk =>
chunk.status === 'pending' || chunk.status === 'processing'
? { ...chunk, status: 'error', error: 'Operation failed or timed out' }
: chunk
));
} finally {
setIsTranscribing(false);
abortControllerRef.current = null;
}
};
const cancelOperation = () => {
if (abortControllerRef.current) {
abortControllerRef.current.abort();
}
};
return (
<div className="container mx-auto p-8 max-w-4xl">
<h1 className="text-3xl font-bold mb-6">Test Audio Transcription</h1>
<div className="bg-white p-6 rounded-lg shadow-md mb-8">
<div className="mb-6">
<label className="block text-gray-700 mb-2 font-semibold">Upload Audio File</label>
<div className="text-sm text-gray-600 mb-2">
Maximum file size: {MAX_FILE_SIZE_MB}MB
</div>
<input
type="file"
ref={fileInputRef}
onChange={handleFileChange}
accept="audio/*"
className="block w-full text-gray-700 border border-gray-300 rounded py-2 px-3"
disabled={isTranscribing}
/>
{file && (
<div className="mt-2 space-y-1 text-sm text-gray-600">
<p>Selected file: {file.name}</p>
<p>Size: {(file.size / (1024 * 1024)).toFixed(2)} MB</p>
{audioDuration && (
<p>Duration: {Math.round(audioDuration)} seconds ({(audioDuration / 60).toFixed(2)} minutes)</p>
)}
</div>
)}
{uploadedUrl && (
<div className="mt-2 p-2 bg-gray-50 rounded text-sm">
<p className="font-semibold text-gray-700">Uploaded File URL:</p>
<a href={uploadedUrl} target="_blank" rel="noopener noreferrer"
className="text-blue-600 break-all hover:underline">
{uploadedUrl}
</a>
</div>
)}
</div>
<div className="flex gap-4">
<button
onClick={handleTranscribe}
disabled={!file || isTranscribing}
className={`px-4 py-2 rounded font-semibold ${
!file || isTranscribing
? 'bg-gray-300 text-gray-500 cursor-not-allowed'
: 'bg-blue-600 text-white hover:bg-blue-700'
}`}
>
{isTranscribing ? 'Transcribing...' : 'Transcribe Audio'}
</button>
{isTranscribing && (
<button
onClick={cancelOperation}
className="px-4 py-2 rounded font-semibold bg-red-600 text-white hover:bg-red-700"
>
Cancel
</button>
)}
</div>
</div>
{chunks.length > 0 && (
<div className="bg-white p-6 rounded-lg shadow-md mb-6">
<h2 className="text-xl font-bold mb-4">Chunks Status</h2>
<div className="grid grid-cols-1 md:grid-cols-2 gap-4">
{chunks.map((chunk) => (
<div
key={chunk.index}
className={`p-4 rounded-lg border ${
chunk.status === 'pending' ? 'bg-gray-50 border-gray-200' :
chunk.status === 'processing' ? 'bg-blue-50 border-blue-200' :
chunk.status === 'completed' ? 'bg-green-50 border-green-200' :
'bg-red-50 border-red-200'
}`}
>
<div className="flex justify-between items-center mb-2">
<span className="font-semibold">Chunk {chunk.index + 1}</span>
<span className={`px-2 py-1 rounded text-sm ${
chunk.status === 'pending' ? 'bg-gray-200 text-gray-700' :
chunk.status === 'processing' ? 'bg-blue-200 text-blue-700' :
chunk.status === 'completed' ? 'bg-green-200 text-green-700' :
'bg-red-200 text-red-700'
}`}>
{chunk.status.charAt(0).toUpperCase() + chunk.status.slice(1)}
</span>
</div>
{chunk.text && (
<div className="text-sm text-gray-600 mt-2">
<div className="font-semibold">Preview:</div>
<div className="italic">{chunk.text.slice(0, 100)}...</div>
</div>
)}
{chunk.error && (
<div className="text-sm text-red-600 mt-2">
Error: {chunk.error}
</div>
)}
</div>
))}
</div>
</div>
)}
{currentStep && (
<div className="bg-blue-50 border-l-4 border-blue-500 text-blue-700 p-4 mb-6">
<p className="font-bold">Current Status:</p>
<p>{currentStep}</p>
</div>
)}
{error && (
<div className="bg-red-100 border-l-4 border-red-500 text-red-700 p-4 mb-6">
<p className="font-bold">Error:</p>
<p className="whitespace-pre-wrap">{error}</p>
</div>
)}
{(transcription || isTranscribing) && (
<div className="bg-white p-6 rounded-lg shadow-md">
<h2 className="text-xl font-bold mb-3">Transcription {isTranscribing && '(Processing...)'}</h2>
<div className="bg-gray-50 p-4 rounded border border-gray-200 min-h-[200px]">
{transcription ? (
<p className="whitespace-pre-wrap">{transcription}</p>
) : (
<div className="flex justify-center items-center h-full">
<div className="animate-spin rounded-full h-8 w-8 border-t-2 border-b-2 border-blue-500"></div>
</div>
)}
</div>
</div>
)}
</div>
);
} |
@aexshafii could you test and review this? |
@benjaminshafii testing now |
75404fe
to
38cf0b1
Compare
@onyedikachi-david recorded this short video to show you the process: https://www.loom.com/share/7f4b7a4f1bd048399e8ca1b9887cda75 can you share your email address so I can send you a license key for you to test easily? |
Thanks for the review, I'll look into it shortly, here is my email: [email protected] |
@aexshafii Hi, I sent an email yesterday, the licence key says invalid. |
/claim #365
Fixes: #365