Skip to content

feat(audio): replace audio-server with Vercel Blobs and OpenAI Whisper #368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

onyedikachi-david
Copy link
Contributor

/claim #365
Fixes: #365

…r - Implement audio chunking for files over 20 minutes - Add Vercel Blob storage - Update plugin transcription endpoint - Remove audio-server dependency - Add parallel processing - Update to Clerk auth - Add progress indicators
Copy link

vercel bot commented Mar 15, 2025

@onyedikachi-david is attempting to deploy a commit to the Prologe Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Implemented a new audio transcription system using Vercel Blobs for storage and OpenAI Whisper for processing, replacing the dedicated audio-server package with a more streamlined solution.

  • Added splitAudioIntoChunks function in /packages/web/app/api/(new-ai)/transcribe/route.ts to handle files over 24 minutes using ffmpeg
  • Implemented parallel processing with rate limiting (1s delay between chunks) to prevent OpenAI API throttling
  • Added Vercel Blob storage with 1-hour cache for audio files using put function
  • Added streaming response handling to return transcription results in real-time
  • Updated authentication from Unkey to Clerk with proper session validation

💡 (2/5) Greptile learns from your feedback when you react with 👍/👎!

3 file(s) reviewed, 3 comment(s)
Edit PR Review Bot Settings | Greptile

…improve audio chunk handling

Signed-off-by: David Anyatonwu <[email protected]>
Copy link

vercel bot commented Mar 15, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
file-organizer-2000 ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 25, 2025 2:44pm

export const maxDuration = 7200; // 120 minutes for long transcriptions
@benjaminshafii
Copy link
Member

benjaminshafii commented Mar 15, 2025

Doesn't work.

Please test on vercel + show proof that it works with video.

I keep getting request entity to large. This flow only works if users do direct uploads to vercel blobs

@onyedikachi-david
Copy link
Contributor Author

Doesn't work.

Please test on vercel + show proof that it works with video.

I keep getting request entity to large. This flow only works if users do direct uploads to vercel blobs

Okay

@onyedikachi-david
Copy link
Contributor Author

note-companion.mp4

Hello, @benjaminshafii Here is a demo, demoing on Obsidian was probelematic (Do I need a licence key for that?) as i explained in the demo. Had to create a page to test it locally and it works, I may assume it will work also in the built plugin. Below is the page used to test. I also couldn't test with 120 minutes (I tested with 88+ minutes) audio due to the OpenAI cost model cost 🥲🥲

page.tsx

'use client';

import { useState, useRef } from 'react';

const MAX_FILE_SIZE_MB = 50;
const UPLOAD_TIMEOUT_MS = 5 * 60 * 1000; // 5 minutes
const CHUNK_SIZE = 20 * 60; // 20 minutes in seconds (matching server config)

interface ChunkStatus {
  index: number;
  status: 'pending' | 'processing' | 'completed' | 'error';
  text?: string;
  error?: string;
}

export default function TestTranscription() {
  const [file, setFile] = useState<File | null>(null);
  const [transcription, setTranscription] = useState<string>('');
  const [isTranscribing, setIsTranscribing] = useState<boolean>(false);
  const [error, setError] = useState<string | null>(null);
  const [uploadedUrl, setUploadedUrl] = useState<string | null>(null);
  const [audioDuration, setAudioDuration] = useState<number | null>(null);
  const [uploadProgress, setUploadProgress] = useState<number>(0);
  const [currentStep, setCurrentStep] = useState<string>('');
  const [chunks, setChunks] = useState<ChunkStatus[]>([]);
  const fileInputRef = useRef<HTMLInputElement>(null);
  const abortControllerRef = useRef<AbortController | null>(null);

  const getAudioDuration = (file: File): Promise<number> => {
    return new Promise((resolve, reject) => {
      const audio = new Audio();
      const reader = new FileReader();

      reader.onload = (e) => {
        if (e.target?.result) {
          audio.src = e.target.result as string;
          audio.onloadedmetadata = () => {
            resolve(audio.duration);
          };
          audio.onerror = () => reject(new Error('Failed to load audio file'));
        }
      };
      reader.onerror = () => reject(new Error('Failed to read file'));
      reader.readAsDataURL(file);
    });
  };

  const handleFileChange = async (e: React.ChangeEvent<HTMLInputElement>) => {
    if (e.target.files && e.target.files.length > 0) {
      const selectedFile = e.target.files[0];
      const fileSizeMB = selectedFile.size / (1024 * 1024);
      
      if (fileSizeMB > MAX_FILE_SIZE_MB) {
        setError(`File size (${fileSizeMB.toFixed(2)}MB) exceeds the maximum limit of ${MAX_FILE_SIZE_MB}MB`);
        return;
      }

      try {
        const duration = await getAudioDuration(selectedFile);
        setAudioDuration(duration);
        
        // Calculate expected chunks
        const numChunks = Math.ceil(duration / CHUNK_SIZE);
        const initialChunks: ChunkStatus[] = Array.from({ length: numChunks }, (_, i) => ({
          index: i,
          status: 'pending',
        }));
        setChunks(initialChunks);
        
        setFile(selectedFile);
        setError(null);
        setCurrentStep('File selected and validated');
      } catch (err) {
        setError('Failed to read audio file duration. Please ensure it\'s a valid audio file.');
        setFile(null);
        setAudioDuration(null);
        setChunks([]);
      }
    }
  };

  const handleTranscribe = async () => {
    if (!file) {
      setError('Please select an audio file');
      return;
    }

    setIsTranscribing(true);
    setError(null);
    setTranscription('');
    setUploadProgress(0);

    // Create new AbortController for this operation
    abortControllerRef.current = new AbortController();
    const { signal } = abortControllerRef.current;

    try {
      // Step 1: Upload the file
      setCurrentStep('Uploading file...');
      const formData = new FormData();
      formData.append('file', file);
      
      const uploadTimeout = setTimeout(() => {
        if (abortControllerRef.current) {
          abortControllerRef.current.abort();
        }
      }, UPLOAD_TIMEOUT_MS);

      const uploadResponse = await fetch('/api/transcribe/upload', {
        method: 'POST',
        body: formData,
        signal,
      });

      clearTimeout(uploadTimeout);

      if (!uploadResponse.ok) {
        const errorData = await uploadResponse.json();
        throw new Error(`Failed to upload file: ${errorData.error || uploadResponse.statusText}`);
      }

      const { url } = await uploadResponse.json();
      setUploadedUrl(url);
      const extension = file.name.split('.').pop()?.toLowerCase() || '';
      
      // Update progress information
      setCurrentStep('File uploaded successfully. Starting transcription...');
      setTranscription('Processing audio file...\n');
      setTranscription(prev => prev + `File format: ${extension}\n`);
      setTranscription(prev => prev + `File size: ${(file.size / (1024 * 1024)).toFixed(2)} MB\n`);
      if (audioDuration) {
        setTranscription(prev => prev + `Duration: ${Math.round(audioDuration)} seconds (${(audioDuration / 60).toFixed(2)} minutes)\n`);
        setTranscription(prev => prev + `Number of chunks: ${chunks.length}\n`);
      }
      setTranscription(prev => prev + '\nTranscribing...\n\n');

      // Step 2: Transcribe
      setCurrentStep('Transcribing audio...');
      const transcribeResponse = await fetch('/api/transcribe', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          blobUrl: url,
          extension,
        }),
        signal,
      });

      if (!transcribeResponse.ok) {
        const errorText = await transcribeResponse.text();
        try {
          const errorJson = JSON.parse(errorText);
          throw new Error(`Failed to transcribe audio: ${errorJson.error || transcribeResponse.statusText}`);
        } catch (e) {
          throw new Error(`Failed to transcribe audio: ${errorText || transcribeResponse.statusText}`);
        }
      }

      // Step 3: Stream response
      setCurrentStep('Receiving transcription...');
      const reader = transcribeResponse.body?.getReader();
      if (!reader) {
        throw new Error('No response body');
      }

      let result = '';
      let reading = true;

      // Initialize first chunk as processing
      if (chunks.length > 0) {
        setChunks(prev => prev.map((chunk, i) => ({
          ...chunk,
          status: i === 0 ? 'processing' : 'pending'
        })));
      }

      try {
        while (reading) {
          const { done, value } = await reader.read();
          
          if (done) {
            // When done, mark the chunk as completed with the full text
            if (chunks.length > 0) {
              setChunks(prev => prev.map((chunk, index) => ({
                ...chunk,
                status: 'completed',
                text: result.trim()
              })));
            }
            reading = false;
            continue;
          }

          const chunkText = new TextDecoder().decode(value);
          result += chunkText;

          // Update transcription immediately
          setTranscription(prev => {
            const lines = prev.split('\n');
            return [...lines.slice(0, 6), result].join('\n');
          });
        }
      } catch (streamError) {
        console.error('Error processing stream:', streamError);
        // Mark all chunks as error
        setChunks(prev => prev.map(chunk => ({
          ...chunk,
          status: 'error',
          error: streamError instanceof Error ? streamError.message : 'Stream processing failed'
        })));
        throw streamError;
      }

      setCurrentStep('Transcription completed');
    } catch (err) {
      if (err instanceof Error && err.name === 'AbortError') {
        setError('Operation timed out. Please try with a smaller file or check your connection.');
      } else {
        console.error('Error in transcription:', err);
        setError(err instanceof Error ? err.message : 'An unknown error occurred');
      }
      
      // Mark remaining chunks as error
      setChunks(prev => prev.map(chunk => 
        chunk.status === 'pending' || chunk.status === 'processing'
          ? { ...chunk, status: 'error', error: 'Operation failed or timed out' }
          : chunk
      ));
    } finally {
      setIsTranscribing(false);
      abortControllerRef.current = null;
    }
  };

  const cancelOperation = () => {
    if (abortControllerRef.current) {
      abortControllerRef.current.abort();
    }
  };

  return (
    <div className="container mx-auto p-8 max-w-4xl">
      <h1 className="text-3xl font-bold mb-6">Test Audio Transcription</h1>
      
      <div className="bg-white p-6 rounded-lg shadow-md mb-8">
        <div className="mb-6">
          <label className="block text-gray-700 mb-2 font-semibold">Upload Audio File</label>
          <div className="text-sm text-gray-600 mb-2">
            Maximum file size: {MAX_FILE_SIZE_MB}MB
          </div>
          <input
            type="file"
            ref={fileInputRef}
            onChange={handleFileChange}
            accept="audio/*"
            className="block w-full text-gray-700 border border-gray-300 rounded py-2 px-3"
            disabled={isTranscribing}
          />
          {file && (
            <div className="mt-2 space-y-1 text-sm text-gray-600">
              <p>Selected file: {file.name}</p>
              <p>Size: {(file.size / (1024 * 1024)).toFixed(2)} MB</p>
              {audioDuration && (
                <p>Duration: {Math.round(audioDuration)} seconds ({(audioDuration / 60).toFixed(2)} minutes)</p>
              )}
            </div>
          )}
          {uploadedUrl && (
            <div className="mt-2 p-2 bg-gray-50 rounded text-sm">
              <p className="font-semibold text-gray-700">Uploaded File URL:</p>
              <a href={uploadedUrl} target="_blank" rel="noopener noreferrer" 
                 className="text-blue-600 break-all hover:underline">
                {uploadedUrl}
              </a>
            </div>
          )}
        </div>
        
        <div className="flex gap-4">
          <button
            onClick={handleTranscribe}
            disabled={!file || isTranscribing}
            className={`px-4 py-2 rounded font-semibold ${
              !file || isTranscribing
                ? 'bg-gray-300 text-gray-500 cursor-not-allowed'
                : 'bg-blue-600 text-white hover:bg-blue-700'
            }`}
          >
            {isTranscribing ? 'Transcribing...' : 'Transcribe Audio'}
          </button>

          {isTranscribing && (
            <button
              onClick={cancelOperation}
              className="px-4 py-2 rounded font-semibold bg-red-600 text-white hover:bg-red-700"
            >
              Cancel
            </button>
          )}
        </div>
      </div>
      
      {chunks.length > 0 && (
        <div className="bg-white p-6 rounded-lg shadow-md mb-6">
          <h2 className="text-xl font-bold mb-4">Chunks Status</h2>
          <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
            {chunks.map((chunk) => (
              <div 
                key={chunk.index}
                className={`p-4 rounded-lg border ${
                  chunk.status === 'pending' ? 'bg-gray-50 border-gray-200' :
                  chunk.status === 'processing' ? 'bg-blue-50 border-blue-200' :
                  chunk.status === 'completed' ? 'bg-green-50 border-green-200' :
                  'bg-red-50 border-red-200'
                }`}
              >
                <div className="flex justify-between items-center mb-2">
                  <span className="font-semibold">Chunk {chunk.index + 1}</span>
                  <span className={`px-2 py-1 rounded text-sm ${
                    chunk.status === 'pending' ? 'bg-gray-200 text-gray-700' :
                    chunk.status === 'processing' ? 'bg-blue-200 text-blue-700' :
                    chunk.status === 'completed' ? 'bg-green-200 text-green-700' :
                    'bg-red-200 text-red-700'
                  }`}>
                    {chunk.status.charAt(0).toUpperCase() + chunk.status.slice(1)}
                  </span>
                </div>
                {chunk.text && (
                  <div className="text-sm text-gray-600 mt-2">
                    <div className="font-semibold">Preview:</div>
                    <div className="italic">{chunk.text.slice(0, 100)}...</div>
                  </div>
                )}
                {chunk.error && (
                  <div className="text-sm text-red-600 mt-2">
                    Error: {chunk.error}
                  </div>
                )}
              </div>
            ))}
          </div>
        </div>
      )}
      
      {currentStep && (
        <div className="bg-blue-50 border-l-4 border-blue-500 text-blue-700 p-4 mb-6">
          <p className="font-bold">Current Status:</p>
          <p>{currentStep}</p>
        </div>
      )}
      
      {error && (
        <div className="bg-red-100 border-l-4 border-red-500 text-red-700 p-4 mb-6">
          <p className="font-bold">Error:</p>
          <p className="whitespace-pre-wrap">{error}</p>
        </div>
      )}
      
      {(transcription || isTranscribing) && (
        <div className="bg-white p-6 rounded-lg shadow-md">
          <h2 className="text-xl font-bold mb-3">Transcription {isTranscribing && '(Processing...)'}</h2>
          <div className="bg-gray-50 p-4 rounded border border-gray-200 min-h-[200px]">
            {transcription ? (
              <p className="whitespace-pre-wrap">{transcription}</p>
            ) : (
              <div className="flex justify-center items-center h-full">
                <div className="animate-spin rounded-full h-8 w-8 border-t-2 border-b-2 border-blue-500"></div>
              </div>
            )}
          </div>
        </div>
      )}
    </div>
  );
} 

@benjaminshafii
Copy link
Member

@aexshafii could you test and review this?

@aexshafii
Copy link
Collaborator

@benjaminshafii testing now

@aexshafii
Copy link
Collaborator

aexshafii commented Mar 25, 2025

@onyedikachi-david
I tested the feature but it's not working inside Obsidian.

recorded this short video to show you the process: https://www.loom.com/share/7f4b7a4f1bd048399e8ca1b9887cda75

can you share your email address so I can send you a license key for you to test easily?

@onyedikachi-david
Copy link
Contributor Author

@onyedikachi-david I tested the feature but it's not working inside Obsidian.

recorded this short video to show you the process: https://www.loom.com/share/7f4b7a4f1bd048399e8ca1b9887cda75

can you share your email address so I can send you a license key for you to test easily?

Thanks for the review, I'll look into it shortly, here is my email: [email protected]

@onyedikachi-david
Copy link
Contributor Author

@aexshafii Hi, I sent an email yesterday, the licence key says invalid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate Transcription to Vercel Blobs + OpenAI Whisper
3 participants