FastAPI WhisperX Transcription Service

This repository provides a FastAPI-based API for audio transcription, alignment, and speaker diarization using WhisperX.

👉 Try the live demo on Hugging Face Spaces

📁 Project Structure

fastapi/
├── app/
│   └── main.py                # FastAPI app with /transcribe endpoint
├── test_audios/               # Example audio files for testing
│   ├── BernhardtCrescent.wav
│   ├── BlackStone_en_in.mp4
│   ├── BlackStone_en_in.wav
│   ├── fillicafe.wav
│   └── harvard.wav
├── requirements.txt           # Python dependencies
├── dockerfile                 # Docker setup

⚙️ Setup & Local Development

1. Clone the Repository

git clone <your-repo-url>
cd fastapi

2. Create and Activate a Virtual Environment

Option A: Python

python3 -m venv .venv
source .venv/bin/activate

Option B: Conda (Recommended)

Build and run the container:

conda create --name whisperx_api python==3.10 
conda activate whisperx_api

3. Install Python Dependencies

pip install --upgrade pip
pip install -r requirements.txt

4. Install System Dependencies

ffmpeg is required for audio processing.

On Ubuntu/Debian:

sudo apt-get update && sudo apt-get install -y ffmpeg git

🚀 Running the API

Option A: Local (Recommended for development)

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Option B: Docker

Build and run the container:

docker build -t curify_fastapi .  
docker run -p 8000:8000 curify_fastapi

🧪 Testing the API

1. Using `curl`

Upload an audio file from test_audios/ for transcription:

curl -X POST "http://localhost:8000/transcribe" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@test_audios/harvard.wav"

2. Using Swagger UI

Visit http://localhost:8000/docs for interactive API documentation.

📝 Endpoint Overview

POST /transcribe
Upload an audio file and receive a transcript with speaker labels and timestamps.

To extract audio from video using ffmpeg

pip install ffmpeg
ffmpeg -i BlackStone_en_in.mp4 -ar 16000 -ac 1 BlackStone_en_in.wav

🧹 Cleanup

Temporary files are automatically deleted after each request.

🛠️ Notes

The WhisperX model is loaded once at startup for efficiency.
Diarization uses a Hugging Face token (edit in main.py if needed).
For best results, use clear audio files (see test_audios/ for examples).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gradio		.gradio
app		app
test_audios		test_audios
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
ReadMe.md		ReadMe.md
app.py		app.py
dockerfile		dockerfile
requirements.txt		requirements.txt
results.txt		results.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FastAPI WhisperX Transcription Service

📁 Project Structure

⚙️ Setup & Local Development

1. Clone the Repository

2. Create and Activate a Virtual Environment

Option A: Python

Option B: Conda (Recommended)

3. Install Python Dependencies

4. Install System Dependencies

🚀 Running the API

Option A: Local (Recommended for development)

Option B: Docker

🧪 Testing the API

1. Using `curl`

2. Using Swagger UI

📝 Endpoint Overview

To extract audio from video using ffmpeg

🧹 Cleanup

🛠️ Notes

About

Uh oh!

Releases

Packages

Languages

rafipatel/fastapi_whisperx_transcription_service

Folders and files

Latest commit

History

Repository files navigation

FastAPI WhisperX Transcription Service

📁 Project Structure

⚙️ Setup & Local Development

1. Clone the Repository

2. Create and Activate a Virtual Environment

Option A: Python

Option B: Conda (Recommended)

3. Install Python Dependencies

4. Install System Dependencies

🚀 Running the API

Option A: Local (Recommended for development)

Option B: Docker

🧪 Testing the API

1. Using curl

2. Using Swagger UI

📝 Endpoint Overview

To extract audio from video using ffmpeg

🧹 Cleanup

🛠️ Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Using `curl`

Packages