feat(audio): replace audio-server with Vercel Blobs and OpenAI Whisper #368

onyedikachi-david · 2025-03-15T15:35:34Z

/claim #365
Fixes: #365

…r - Implement audio chunking for files over 20 minutes - Add Vercel Blob storage - Update plugin transcription endpoint - Remove audio-server dependency - Add parallel processing - Update to Clerk auth - Add progress indicators

vercel · 2025-03-15T15:35:40Z

@onyedikachi-david is attempting to deploy a commit to the Prologe Team on Vercel.

A member of the Team first needs to authorize it.

greptile-apps

PR Summary

Implemented a new audio transcription system using Vercel Blobs for storage and OpenAI Whisper for processing, replacing the dedicated audio-server package with a more streamlined solution.

Added splitAudioIntoChunks function in /packages/web/app/api/(new-ai)/transcribe/route.ts to handle files over 24 minutes using ffmpeg
Implemented parallel processing with rate limiting (1s delay between chunks) to prevent OpenAI API throttling
Added Vercel Blob storage with 1-hour cache for audio files using put function
Added streaming response handling to return transcription results in real-time
Updated authentication from Unkey to Clerk with proper session validation

_{💡 (2/5) Greptile learns from your feedback when you react with 👍/👎!}

_{3 file(s) reviewed, 3 comment(s)}
_{Edit PR Review Bot Settings | Greptile}

packages/web/app/api/(new-ai)/transcribe/route.ts

…improve audio chunk handling Signed-off-by: David Anyatonwu <[email protected]>

vercel · 2025-03-15T18:02:10Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
file-organizer-2000	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 25, 2025 2:44pm

export const maxDuration = 7200; // 120 minutes for long transcriptions

benjaminshafii · 2025-03-15T19:52:54Z

Doesn't work.

Please test on vercel + show proof that it works with video.

I keep getting request entity to large. This flow only works if users do direct uploads to vercel blobs

onyedikachi-david · 2025-03-15T19:58:08Z

Doesn't work.

Please test on vercel + show proof that it works with video.

I keep getting request entity to large. This flow only works if users do direct uploads to vercel blobs

Okay

…d URL and implementing direct uploads with FormData Signed-off-by: David Anyatonwu <[email protected]>

onyedikachi-david · 2025-03-22T17:36:08Z

note-companion.mp4

Hello, @benjaminshafii Here is a demo, demoing on Obsidian was probelematic (Do I need a licence key for that?) as i explained in the demo. Had to create a page to test it locally and it works, I may assume it will work also in the built plugin. Below is the page used to test. I also couldn't test with 120 minutes (I tested with 88+ minutes) audio due to the OpenAI cost model cost 🥲🥲

page.tsx

'use client';

import { useState, useRef } from 'react';

const MAX_FILE_SIZE_MB = 50;
const UPLOAD_TIMEOUT_MS = 5 * 60 * 1000; // 5 minutes
const CHUNK_SIZE = 20 * 60; // 20 minutes in seconds (matching server config)

interface ChunkStatus {
  index: number;
  status: 'pending' | 'processing' | 'completed' | 'error';
  text?: string;
  error?: string;
}

export default function TestTranscription() {
  const [file, setFile] = useState<File | null>(null);
  const [transcription, setTranscription] = useState<string>('');
  const [isTranscribing, setIsTranscribing] = useState<boolean>(false);
  const [error, setError] = useState<string | null>(null);
  const [uploadedUrl, setUploadedUrl] = useState<string | null>(null);
  const [audioDuration, setAudioDuration] = useState<number | null>(null);
  const [uploadProgress, setUploadProgress] = useState<number>(0);
  const [currentStep, setCurrentStep] = useState<string>('');
  const [chunks, setChunks] = useState<ChunkStatus[]>([]);
  const fileInputRef = useRef<HTMLInputElement>(null);
  const abortControllerRef = useRef<AbortController | null>(null);

  const getAudioDuration = (file: File): Promise<number> => {
    return new Promise((resolve, reject) => {
      const audio = new Audio();
      const reader = new FileReader();

      reader.onload = (e) => {
        if (e.target?.result) {
          audio.src = e.target.result as string;
          audio.onloadedmetadata = () => {
            resolve(audio.duration);
          };
          audio.onerror = () => reject(new Error('Failed to load audio file'));
        }
      };
      reader.onerror = () => reject(new Error('Failed to read file'));
      reader.readAsDataURL(file);
    });
  };

  const handleFileChange = async (e: React.ChangeEvent<HTMLInputElement>) => {
    if (e.target.files && e.target.files.length > 0) {
      const selectedFile = e.target.files[0];
      const fileSizeMB = selectedFile.size / (1024 * 1024);
      
      if (fileSizeMB > MAX_FILE_SIZE_MB) {
        setError(`File size (${fileSizeMB.toFixed(2)}MB) exceeds the maximum limit of ${MAX_FILE_SIZE_MB}MB`);
        return;
      }

      try {
        const duration = await getAudioDuration(selectedFile);
        setAudioDuration(duration);
        
        // Calculate expected chunks
        const numChunks = Math.ceil(duration / CHUNK_SIZE);
        const initialChunks: ChunkStatus[] = Array.from({ length: numChunks }, (_, i) => ({
          index: i,
          status: 'pending',
        }));
        setChunks(initialChunks);
        
        setFile(selectedFile);
        setError(null);
        setCurrentStep('File selected and validated');
      } catch (err) {
        setError('Failed to read audio file duration. Please ensure it\'s a valid audio file.');
        setFile(null);
        setAudioDuration(null);
        setChunks([]);
      }
    }
  };

  const handleTranscribe = async () => {
    if (!file) {
      setError('Please select an audio file');
      return;
    }

    setIsTranscribing(true);
    setError(null);
    setTranscription('');
    setUploadProgress(0);

    // Create new AbortController for this operation
    abortControllerRef.current = new AbortController();
    const { signal } = abortControllerRef.current;

    try {
      // Step 1: Upload the file
      setCurrentStep('Uploading file...');
      const formData = new FormData();
      formData.append('file', file);
      
      const uploadTimeout = setTimeout(() => {
        if (abortControllerRef.current) {
          abortControllerRef.current.abort();
        }
      }, UPLOAD_TIMEOUT_MS);

      const uploadResponse = await fetch('/api/transcribe/upload', {
        method: 'POST',
        body: formData,
        signal,
      });

      clearTimeout(uploadTimeout);

      if (!uploadResponse.ok) {
        const errorData = await uploadResponse.json();
        throw new Error(`Failed to upload file: ${errorData.error || uploadResponse.statusText}`);
      }

      const { url } = await uploadResponse.json();
      setUploadedUrl(url);
      const extension = file.name.split('.').pop()?.toLowerCase() || '';
      
      // Update progress information
      setCurrentStep('File uploaded successfully. Starting transcription...');
      setTranscription('Processing audio file...\n');
      setTranscription(prev => prev + `File format: ${extension}\n`);
      setTranscription(prev => prev + `File size: ${(file.size / (1024 * 1024)).toFixed(2)} MB\n`);
      if (audioDuration) {
        setTranscription(prev => prev + `Duration: ${Math.round(audioDuration)} seconds (${(audioDuration / 60).toFixed(2)} minutes)\n`);
        setTranscription(prev => prev + `Number of chunks: ${chunks.length}\n`);
      }
      setTranscription(prev => prev + '\nTranscribing...\n\n');

      // Step 2: Transcribe
      setCurrentStep('Transcribing audio...');
      const transcribeResponse = await fetch('/api/transcribe', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          blobUrl: url,
          extension,
        }),
        signal,
      });

      if (!transcribeResponse.ok) {
        const errorText = await transcribeResponse.text();
        try {
          const errorJson = JSON.parse(errorText);
          throw new Error(`Failed to transcribe audio: ${errorJson.error || transcribeResponse.statusText}`);
        } catch (e) {
          throw new Error(`Failed to transcribe audio: ${errorText || transcribeResponse.statusText}`);
        }
      }

      // Step 3: Stream response
      setCurrentStep('Receiving transcription...');
      const reader = transcribeResponse.body?.getReader();
      if (!reader) {
        throw new Error('No response body');
      }

      let result = '';
      let reading = true;

      // Initialize first chunk as processing
      if (chunks.length > 0) {
        setChunks(prev => prev.map((chunk, i) => ({
          ...chunk,
          status: i === 0 ? 'processing' : 'pending'
        })));
      }

      try {
        while (reading) {
          const { done, value } = await reader.read();
          
          if (done) {
            // When done, mark the chunk as completed with the full text
            if (chunks.length > 0) {
              setChunks(prev => prev.map((chunk, index) => ({
                ...chunk,
                status: 'completed',
                text: result.trim()
              })));
            }
            reading = false;
            continue;
          }

          const chunkText = new TextDecoder().decode(value);
          result += chunkText;

          // Update transcription immediately
          setTranscription(prev => {
            const lines = prev.split('\n');
            return [...lines.slice(0, 6), result].join('\n');
          });
        }
      } catch (streamError) {
        console.error('Error processing stream:', streamError);
        // Mark all chunks as error
        setChunks(prev => prev.map(chunk => ({
          ...chunk,
          status: 'error',
          error: streamError instanceof Error ? streamError.message : 'Stream processing failed'
        })));
        throw streamError;
      }

      setCurrentStep('Transcription completed');
    } catch (err) {
      if (err instanceof Error && err.name === 'AbortError') {
        setError('Operation timed out. Please try with a smaller file or check your connection.');
      } else {
        console.error('Error in transcription:', err);
        setError(err instanceof Error ? err.message : 'An unknown error occurred');
      }
      
      // Mark remaining chunks as error
      setChunks(prev => prev.map(chunk => 
        chunk.status === 'pending' || chunk.status === 'processing'
          ? { ...chunk, status: 'error', error: 'Operation failed or timed out' }
          : chunk
      ));
    } finally {
      setIsTranscribing(false);
      abortControllerRef.current = null;
    }
  };

  const cancelOperation = () => {
    if (abortControllerRef.current) {
      abortControllerRef.current.abort();
    }
  };

  return (
    <div className="container mx-auto p-8 max-w-4xl">
      <h1 className="text-3xl font-bold mb-6">Test Audio Transcription</h1>
      
      <div className="bg-white p-6 rounded-lg shadow-md mb-8">
        <div className="mb-6">
          <label className="block text-gray-700 mb-2 font-semibold">Upload Audio File</label>
          <div className="text-sm text-gray-600 mb-2">
            Maximum file size: {MAX_FILE_SIZE_MB}MB
          </div>
          <input
            type="file"
            ref={fileInputRef}
            onChange={handleFileChange}
            accept="audio/*"
            className="block w-full text-gray-700 border border-gray-300 rounded py-2 px-3"
            disabled={isTranscribing}
          />
          {file && (
            <div className="mt-2 space-y-1 text-sm text-gray-600">
              <p>Selected file: {file.name}</p>
              <p>Size: {(file.size / (1024 * 1024)).toFixed(2)} MB</p>
              {audioDuration && (
                <p>Duration: {Math.round(audioDuration)} seconds ({(audioDuration / 60).toFixed(2)} minutes)</p>
              )}
            </div>
          )}
          {uploadedUrl && (
            <div className="mt-2 p-2 bg-gray-50 rounded text-sm">
              <p className="font-semibold text-gray-700">Uploaded File URL:</p>
              <a href={uploadedUrl} target="_blank" rel="noopener noreferrer" 
                 className="text-blue-600 break-all hover:underline">
                {uploadedUrl}
              </a>
            </div>
          )}
        </div>
        
        <div className="flex gap-4">
          <button
            onClick={handleTranscribe}
            disabled={!file || isTranscribing}
            className={`px-4 py-2 rounded font-semibold ${
              !file || isTranscribing
                ? 'bg-gray-300 text-gray-500 cursor-not-allowed'
                : 'bg-blue-600 text-white hover:bg-blue-700'
            }`}
          >
            {isTranscribing ? 'Transcribing...' : 'Transcribe Audio'}
          </button>

          {isTranscribing && (
            <button
              onClick={cancelOperation}
              className="px-4 py-2 rounded font-semibold bg-red-600 text-white hover:bg-red-700"
            >
              Cancel
            </button>
          )}
        </div>
      </div>
      
      {chunks.length > 0 && (
        <div className="bg-white p-6 rounded-lg shadow-md mb-6">
          <h2 className="text-xl font-bold mb-4">Chunks Status</h2>
          <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
            {chunks.map((chunk) => (
              <div 
                key={chunk.index}
                className={`p-4 rounded-lg border ${
                  chunk.status === 'pending' ? 'bg-gray-50 border-gray-200' :
                  chunk.status === 'processing' ? 'bg-blue-50 border-blue-200' :
                  chunk.status === 'completed' ? 'bg-green-50 border-green-200' :
                  'bg-red-50 border-red-200'
                }`}
              >
                <div className="flex justify-between items-center mb-2">
                  <span className="font-semibold">Chunk {chunk.index + 1}</span>
                  <span className={`px-2 py-1 rounded text-sm ${
                    chunk.status === 'pending' ? 'bg-gray-200 text-gray-700' :
                    chunk.status === 'processing' ? 'bg-blue-200 text-blue-700' :
                    chunk.status === 'completed' ? 'bg-green-200 text-green-700' :
                    'bg-red-200 text-red-700'
                  }`}>
                    {chunk.status.charAt(0).toUpperCase() + chunk.status.slice(1)}
                  </span>
                </div>
                {chunk.text && (
                  <div className="text-sm text-gray-600 mt-2">
                    <div className="font-semibold">Preview:</div>
                    <div className="italic">{chunk.text.slice(0, 100)}...</div>
                  </div>
                )}
                {chunk.error && (
                  <div className="text-sm text-red-600 mt-2">
                    Error: {chunk.error}
                  </div>
                )}
              </div>
            ))}
          </div>
        </div>
      )}
      
      {currentStep && (
        <div className="bg-blue-50 border-l-4 border-blue-500 text-blue-700 p-4 mb-6">
          <p className="font-bold">Current Status:</p>
          <p>{currentStep}</p>
        </div>
      )}
      
      {error && (
        <div className="bg-red-100 border-l-4 border-red-500 text-red-700 p-4 mb-6">
          <p className="font-bold">Error:</p>
          <p className="whitespace-pre-wrap">{error}</p>
        </div>
      )}
      
      {(transcription || isTranscribing) && (
        <div className="bg-white p-6 rounded-lg shadow-md">
          <h2 className="text-xl font-bold mb-3">Transcription {isTranscribing && '(Processing...)'}</h2>
          <div className="bg-gray-50 p-4 rounded border border-gray-200 min-h-[200px]">
            {transcription ? (
              <p className="whitespace-pre-wrap">{transcription}</p>
            ) : (
              <div className="flex justify-center items-center h-full">
                <div className="animate-spin rounded-full h-8 w-8 border-t-2 border-b-2 border-blue-500"></div>
              </div>
            )}
          </div>
        </div>
      )}
    </div>
  );
}

benjaminshafii · 2025-03-22T19:10:01Z

@aexshafii could you test and review this?

aexshafii · 2025-03-24T15:38:03Z

@benjaminshafii testing now

aexshafii · 2025-03-25T15:47:24Z

@onyedikachi-david
I tested the feature but it's not working inside Obsidian.

recorded this short video to show you the process: https://www.loom.com/share/7f4b7a4f1bd048399e8ca1b9887cda75

can you share your email address so I can send you a license key for you to test easily?

onyedikachi-david · 2025-03-25T16:00:06Z

@onyedikachi-david I tested the feature but it's not working inside Obsidian.

recorded this short video to show you the process: https://www.loom.com/share/7f4b7a4f1bd048399e8ca1b9887cda75

can you share your email address so I can send you a license key for you to test easily?

Thanks for the review, I'll look into it shortly, here is my email: [email protected]

onyedikachi-david · 2025-03-27T14:23:44Z

@aexshafii Hi, I sent an email yesterday, the licence key says invalid.

algora-pbc bot mentioned this pull request Mar 15, 2025

Migrate Transcription to Vercel Blobs + OpenAI Whisper #365

Open

algora-pbc bot added the 🙋 Bounty claim label Mar 15, 2025

greptile-apps bot reviewed Mar 15, 2025

View reviewed changes

packages/web/app/api/(new-ai)/transcribe/route.ts Outdated Show resolved Hide resolved

packages/web/app/api/(new-ai)/transcribe/route.ts Outdated Show resolved Hide resolved

packages/web/app/api/(new-ai)/transcribe/route.ts Outdated Show resolved Hide resolved

fix(transcribe): update transcription logic to use Vercel AI SDK and …

8a396ba

…improve audio chunk handling Signed-off-by: David Anyatonwu <[email protected]>

vercel bot had a problem deploying to Preview – file-organizer-2000 March 15, 2025 18:04 Failure

fix: attempt max duration change

c793825

export const maxDuration = 7200; // 120 minutes for long transcriptions

vercel bot deployed to Preview – file-organizer-2000 March 15, 2025 18:37 View deployment

Merge branch 'master' into feat/audio-transcription-vercel-blobs

38aec00

onyedikachi-david added 2 commits March 16, 2025 16:49

refactor: implement direct blob upload for large audio files

1d0a61f

refactor: streamline audio transcription process by removing presigne…

38cf0b1

…d URL and implementing direct uploads with FormData Signed-off-by: David Anyatonwu <[email protected]>

vercel bot deployed to Preview – file-organizer-2000 March 25, 2025 11:06 View deployment

vercel bot deployed to Preview – file-organizer-2000 March 25, 2025 14:44 View deployment

aexshafii force-pushed the feat/audio-transcription-vercel-blobs branch from 75404fe to 38cf0b1 Compare March 25, 2025 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(audio): replace audio-server with Vercel Blobs and OpenAI Whisper #368

feat(audio): replace audio-server with Vercel Blobs and OpenAI Whisper #368

onyedikachi-david commented Mar 15, 2025

vercel bot commented Mar 15, 2025

greptile-apps bot left a comment

vercel bot commented Mar 15, 2025 •

edited

Loading

benjaminshafii commented Mar 15, 2025 •

edited

Loading

onyedikachi-david commented Mar 15, 2025

onyedikachi-david commented Mar 22, 2025

benjaminshafii commented Mar 22, 2025

aexshafii commented Mar 24, 2025

aexshafii commented Mar 25, 2025 •

edited

Loading

onyedikachi-david commented Mar 25, 2025

onyedikachi-david commented Mar 27, 2025

feat(audio): replace audio-server with Vercel Blobs and OpenAI Whisper #368

Are you sure you want to change the base?

feat(audio): replace audio-server with Vercel Blobs and OpenAI Whisper #368

Conversation

onyedikachi-david commented Mar 15, 2025

vercel bot commented Mar 15, 2025

greptile-apps bot left a comment

Choose a reason for hiding this comment

PR Summary

vercel bot commented Mar 15, 2025 • edited Loading

benjaminshafii commented Mar 15, 2025 • edited Loading

onyedikachi-david commented Mar 15, 2025

onyedikachi-david commented Mar 22, 2025

benjaminshafii commented Mar 22, 2025

aexshafii commented Mar 24, 2025

aexshafii commented Mar 25, 2025 • edited Loading

onyedikachi-david commented Mar 25, 2025

onyedikachi-david commented Mar 27, 2025

vercel bot commented Mar 15, 2025 •

edited

Loading

benjaminshafii commented Mar 15, 2025 •

edited

Loading

aexshafii commented Mar 25, 2025 •

edited

Loading