Skip to content

Latest commit

 

History

History
80 lines (53 loc) · 1.77 KB

README.md

File metadata and controls

80 lines (53 loc) · 1.77 KB

Transcription Pipeline

This project is for the voice transcription task. It define the Transcription Pipeline class, to transcribe the audio file into text.

Getting Started

Prerequisites

The prerequisites are listed in the requirements.txt file. You can install them by:

pip install -r requirements.txt

You can also running a container with the dockerfile. To build the image, run:

docker build -t transcription-pipeline .

To run the container with a volume, run:

docker run -it -v /path/to/audio:/audio transcription-pipeline

To run the container with a volume and GPU, run:

docker run -it --gpus all -v /path/to/audio:/audio transcription-pipeline

Usage

To use the Transcription Pipeline, you can run the following command:

python transcription_pipeline.py --audio_path /path/to/audio  --engine whisper

The output will be saved in the same directory as the audio file, with the same name as the audio file, but with a .txt extension.

Output format

The output file will be formated as follows:

{
    "audio_path": "/path/to/audio",
    "engine": "whisper",
    "language": "en",
    "transcription": "This is the transcription of the audio file"
}

TODO

  • Define Transcription Pipeline class
  • Implement with Whisper
  • Implement with Google Cloud Speech-to-Text API
  • Implement with OpenAI API
  • Add tests
  • Add support for other languages

Authors

Sebastián Ignacio Bórquez González

License

This project is licensed under the MIT License - see the LICENSE.md file

Built With