Speech-to-Text Pipeline with Whisper and Common Voice Dataset

This project implements a speech-to-text model using OpenAI's Whisper architecture, specifically leveraging the "whisper-large-v3" model for high-quality transcription. The pipeline is designed to utilize GPU if available, defaulting to CPU otherwise.

Features

Automatic Speech Recognition (ASR) using Whisper.
Supports GPU acceleration if available.
Configured for English language transcription.
Removes unnecessary metadata from the dataset for optimized performance.

Model Links

OpenAi Whisper (official website): ('https://openai.com/index/whisper')

Model Paper: ('https://cdn.openai.com/papers/whisper.pdf')

Model Repo: ('https://github.com/openai/whisper')

Huggingface: ('https://huggingface.co/openai/whisper-large-v3')

Evaluation's Dataset: ('https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0')

Installation

First, clone the repo:

git clone https://github.com/YousefMTaha/GP-STT

Second, create python virtual environment and activate it:

python3 -m venv venv
.\venv/Scripts/activate.

Last, install the required dependencies by running one of these two commands:

pip install jiwer flask torch librosa datasets evaluate torchaudio transformers huggingface_hub 'accelerate>=0.26.0'

Or

pip install jiwer
pip install flask
pip install torch
pip install librosa
pip install datasets
pip install evaluate
pip install torchaudio
pip install transformers
pip install huggingface_hub
pip install 'accelerate>=0.26.0'

Run project

First, run AppRouter.py file

python AppRouter.py

Then, check if your file run successfully or not by try this IP: ('http://127.0.0.1:5001') in your browser.

Send your voice to the model using POST method use this IP ('http://127.0.0.1:5001/stt')

If you use android or ios emulater you can use this IP ('http://10.0.2.2:5001/stt')

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
AppRouter.py		AppRouter.py
ModelEvaluation.py		ModelEvaluation.py
README.md		README.md
WhisperSTTModel.py		WhisperSTTModel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-to-Text Pipeline with Whisper and Common Voice Dataset

Features

Model Links

Installation

Run project

About

Releases

Packages

Contributors 3

Languages

YousefMTaha/GP_STT

Folders and files

Latest commit

History

Repository files navigation

Speech-to-Text Pipeline with Whisper and Common Voice Dataset

Features

Model Links

Installation

Run project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages