Transcription Pipeline

This project is for the voice transcription task. It define the Transcription Pipeline class, to transcribe the audio file into text.

Getting Started

Prerequisites

The prerequisites are listed in the requirements.txt file. You can install them by:

pip install -r requirements.txt

You can also running a container with the dockerfile. To build the image, run:

docker build -t transcription-pipeline .

To run the container with a volume, run:

docker run -it -v /path/to/audio:/audio transcription-pipeline

To run the container with a volume and GPU, run:

docker run -it --gpus all -v /path/to/audio:/audio transcription-pipeline

Usage

To use the Transcription Pipeline, you can run the following command:

python transcription_pipeline.py --audio_path /path/to/audio  --engine whisper

The output will be saved in the same directory as the audio file, with the same name as the audio file, but with a .txt extension.

Output format

The output file will be formated as follows:

{
    "audio_path": "/path/to/audio",
    "engine": "whisper",
    "language": "en",
    "transcription": "This is the transcription of the audio file"
}

TODO

Define Transcription Pipeline class
Implement with Whisper
Implement with Google Cloud Speech-to-Text API
Implement with OpenAI API
Add tests
Add support for other languages

Authors

Sebastián Ignacio Bórquez González

License

This project is licensed under the MIT License - see the LICENSE.md file

Built With

OpenAI
Whisper
Google Cloud Speech-to-Text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Transcription Pipeline

Getting Started

Prerequisites

Usage

Output format

TODO

Authors

License

Built With

Files

README.md

Latest commit

History

README.md

File metadata and controls

Transcription Pipeline

Getting Started

Prerequisites

Usage

Output format

TODO

Authors

License

Built With