Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification. For more details: github.com/openai/whisper
- Transcription services for podcasts, videos, or audio files
- Voice command systems for applications
- Accessibility features in software products
- Research projects in natural language processing
- Real-time speech-to-text for live events or streaming
The template comes pre-configured, but you can adjust the following parameters:
ASR_MODEL
: Change the Whisper model size (e.g., "tiny", "base", "small", "medium", "large")ASR_ENGINE
: Set to "openai_whisper" by default- Compute resources: Adjust CPU, memory, and storage as needed
Once deployed, you can interact with the ASR service using HTTP requests. Here's a basic example using curl:
curl -X POST "http://{AKASH_URI}/asr" \
-H "Content-Type: multipart/form-data" \
-F "audio_file=@path/to/your/audio/file.mp3"
Current release (v1.5.0) supports following whisper models:
for more information:
Explore the documentation by clicking here.