This is a PyTorch implementation of Listen, Attend and Spell (LAS) paper
author = {William Chan and
Navdeep Jaitly and
Quoc V. Le and
Oriol Vinyals},
title = {Listen, Attend and Spell},
journal = {CoRR},
volume = {abs/1508.01211},
year = {2015},
url = {},
eprinttype = {arXiv},
eprint = {1508.01211},
timestamp = {Mon, 13 Aug 2018 16:46:45 +0200},
biburl = {},
bibsource = {dblp computer science bibliography,}
In order to train the model on your data follow the steps below
- prepare your data and make sure the data is formatted in an CSV format as below
file/to/file.wav,the text in that file,3.2
- make sure the audios are MONO if not make the proper conversion to meet this condition
- create enviroment
python -m venv env
- activate the enviroment
source env/bin/activate
- install the required dependencies
pip install -r requirements.txt
- update the config file if needed
- train the model
- from scratch
- from checkpoint
python checkpoint=path/to/checkpoint tokenizer.tokenizer_file=path/to/tokenizer.json
- Compeleting the inference module
- Adding Demo