End-to-End Automatic Speech Recognition

This project implements a small scale speech recognition system utilizing a Residual Convolutional Neural Network (CNN) - BiGRU Acoustic Model, a Connectionist Temporal Classification (CTC) Decoder, and a KENLM Language Model for enhanced accuracy.

Model Architecture

Installation

Clone the repository:

git clone --recursive https://github.com/LuluW8071/Automatic-Speech-Recognition-with-PyTorch.git

Install Pytorch and required dependencies under virtual environment:
```
pip install -r requirements.txt
```
Ensure you have PyTorch and Lightning AI installed.

Train Model

Important

Before training make sure you have placed comet ml api key and project name in the environment variable file .env.

py train.py

Customize the pytorch training parameters by passing arguments in train.py to suit your needs:

Refer to the provided table to change hyperparameters and train configurations.

Args	Description	Default Value
`-g, --gpus`	Number of GPUs per node	1
`-g, --num_workers`	Number of CPU workers	8
`-db, --dist_backend`	Distributed backend to use for training	ddp_find_unused_parameters_true
`--epochs`	Number of total epochs to run	50
`--batch_size`	Size of the batch	32
`-lr, --learning_rate`	Learning rate	1e-5 (0.00001)
`--checkpoint_path`	Checkpoint path to resume training from	None
`--precision`	Precision of the training	16-mixed

py train.py 
-g 4                   # Number of GPUs per node for parallel gpu training
-w 8                   # Number of CPU workers for parallel data loading
--epochs 10            # Number of total epochs to run
--batch_size 64        # Size of the batch
-lr 2e-5               # Learning rate
--precision 16-mixed   # Precision of the training

Note

To resume training from a saved checkpoint, use:

py train.py --checkpoint_path path_to_checkpoint.ckpt

Additional Resources

For pre-trained models and other resources, refer to the provided links. Click here to download pre trained model

This comprehensive guide should help you navigate through setting up and using the Speech Recognition system effectively. If you encounter any issues or have questions, feel free to reach out!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

End-to-End Automatic Speech Recognition

Model Architecture

Installation

Train Model

Additional Resources

Files

README.md

Latest commit

History

README.md

File metadata and controls

End-to-End Automatic Speech Recognition

Model Architecture

Installation

Train Model

Additional Resources