Skip to content

A mini, simple, and fast end-to-end automatic speech recognition toolkit.

License

Notifications You must be signed in to change notification settings

vectominist/MiniASR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

d98ab07 · Dec 6, 2022

History

27 Commits
Jul 19, 2022
Dec 6, 2022
Oct 2, 2021
Dec 6, 2022
Jul 20, 2022
Sep 7, 2021
Jul 19, 2022
Jul 14, 2021
Dec 6, 2022
Oct 26, 2022
Oct 26, 2022
Sep 7, 2021
Oct 26, 2022
Oct 26, 2022
Dec 6, 2022

Repository files navigation

MiniASR

A mini, simple, and fast end-to-end automatic speech recognition toolkit.


GitHub

Intro

Why Mini?

  • Minimal Training
    Self-supervised pre-trained models + minimal fine-tuning.
  • Simple and Flexible ⚙️
    Easy to understand and customize.
  • Colab Compatible 🧪
    Train your model directly on Google Colab.

ASR Pipeline

  • Preprocessing (run_preprocess.py)
    • Find all audio files and transcriptions.
    • Generate vocabularies (character/word/subword/code-switched).
  • Training (run_asr.py)
    • Dataset (miniasr/data/dataset.py)
      • Tokenizer for text data (miniasr/data/text.py)
    • DataLoader (miniasr/data/dataloader.py)
    • Model (miniasr/model/base_asr.py)
      • Feature extractor
      • Data augmentation
      • End-to-end CTC ASR
  • Testing (run_asr.py)
    • CTC greedy/beam decoding
    • Performance measures: error rates, RTF, latency

Instructions

Requirements

  • Python 3.6+
  • Install sox on your OS
  • Install latest s3prl (at least v0.4)
git clone https://github.com/s3prl/s3prl.git
cd s3prl
pip install -e ./
cd ..
  • Install via pip:
pip install -e ./

Additional libraries:

Pre-trained ASR

You can directly use pre-trained ASR models for any applications. (under construction 🚧)

from miniasr.utils import load_from_checkpoint
from miniasr.data.audio import load_waveform

# Option 1: Loading from a checkpoint
model, args, tokenizer = load_from_checkpoint('path/to/ckpt', 'cuda')
# Option 2: Loading from torch.hub (TODO)
model = torch.hub.load('vectominist/MiniASR', 'ctc_eng').to('cuda')

# Load waveforms and recognize!
waves = [load_waveform('path/to/waveform').to('cuda')]
hyps = model.recognize(waves)

Preprocessing

  • For already implemented corpora, please see egs/.
  • To customize your own dataset, please see miniasr/preprocess.
miniasr-preprocess

Options:

  --corpus Corpus name.
  --path Path to dataset.
  --set Which subsets to be processed.
  --out Output directory.
  --gen-vocab Specify whether to generate vocabulary files.
  --char-vocab-size Character vocabulary size.
  --word-vocab-size Word vocabulary size.
  --subword-vocab-size Subword vocabulary size.
  --gen-subword Specify whether to generate subword vocabulary.
  --subword-mode {unigram,bpe} Subword training mode.
  --char-coverage Character coverage.
  --seed SEED Set random seed.
  --njobs Number of workers.
  --log-file Logging file.
  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} Logging level.

Training & Testing

See examples in egs/.

miniasr-asr

Options:

  --config Training configuration file (.yaml).
  --test Specify testing mode.
  --ckpt Checkpoint for testing.
  --test-name Specify testing results' name.
  --cpu Using CPU only.
  --seed Set random seed.
  --njobs Number of workers.
  --log-file Logging file.
  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} Logging level.

TODO List

  • torch.hub support
  • Releasing pre-trained ASR models

Reference Papers

Reference Repos

Citation

@misc{chang2021miniasr,
  title={{MiniASR}},
  author={Chang, Heng-Jui},
  year={2021},
  url={https://github.com/vectominist/MiniASR}
}