Skip to content

A collection of speech language models with a focus on acoustic codes

Notifications You must be signed in to change notification settings

ogunlao/speech_language_models

Repository files navigation

Speech language models

This library tracks and implements speech language models, with particular focus on recent advances in speech langage modelling using acoustic codes.

The implementation are for research purposes and are intended to follow the papers that proposed them more closely.

Models currently implemented

  1. vq-vae(2017): Proposed vector quantization for acoustic code discovery using gumbel softmax for straight-through estimation of latents acoustic code. Also, showed that acoustic codes are closely related to phoneme categories.
  2. wav2vec: Proposed (contrastive) future time step prediction of actual tokens from negative samples.
  3. vq-wav2vec: Applied vector quantization to wav2vec and show promising results for speech language modelling (using masked language modelling), ASR, They also showed higher quality to compression ratio than conventional audio compression algotithms.
  4. wav2vec Discrete and Continuous training: They compared using quantized speeech vs wav2vec features, Filterbanks, and MFCC, showing that quantized features perform better than continuous features for ASR.
  5. wav2vec2: Similar to vq-wav2vec, but with performance improvements and changes in training strategies. Transformer used as the context network to learn long-term dependencies. \i substituted transformers with Conformers in my implementation.

Special acknowledgements

  1. Pytorch lightning: used for multi-gpu training and inference
  2. Tabisha: His implementation of the WavNet vocoder is adapted as the VQ-VAE decoder.

About

A collection of speech language models with a focus on acoustic codes

Topics

Resources

Stars

Watchers

Forks

Languages