Skip to content

Latest commit

 

History

History
 
 

whisper-asr-gpu

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Whisper ASR Webservice

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification. For more details: github.com/openai/whisper

Use Cases

  • Transcription services for podcasts, videos, or audio files
  • Voice command systems for applications
  • Accessibility features in software products
  • Research projects in natural language processing
  • Real-time speech-to-text for live events or streaming

Configuration Options

The template comes pre-configured, but you can adjust the following parameters:

  • ASR_MODEL: Change the Whisper model size (e.g., "tiny", "base", "small", "medium", "large")
  • ASR_ENGINE: Set to "openai_whisper" by default
  • Compute resources: Adjust CPU, memory, and storage as needed

Using the Deployed Service

Once deployed, you can interact with the ASR service using HTTP requests. Here's a basic example using curl:

curl -X POST "http://{AKASH_URI}/asr" \
     -H "Content-Type: multipart/form-data" \
     -F "audio_file=@path/to/your/audio/file.mp3"

Features

Current release (v1.5.0) supports following whisper models:

for more information:

Documentation

Explore the documentation by clicking here.

Credits

  • This software uses libraries from the FFmpeg project under the LGPLv2.1