Losses and decoders for end-to-end Speech Recognition and Optical Character Recognition with PyTorch

The module focuses on experiments with CTC-loss (Connectionist Temporal Classification) and its modifications.

Under active development.

Documentation

artbataev.github.io/end2end/

Losses

CTC (C++, CPU)

Decoders

CTC Greedy Decoder (C++, CPU)
CTC Beam Search Decoder (C++, CPU)
CTC Beam Search Decoder with language model (C++, CPU)

How to install

Requirements:

Python 3.6+
Tested with PyTorch 1.6.0+ (maybe compatible with other versions)

Install PyTorch from pytorch.org, e.g.
```
pip install torch
```

Install tools to compile

sudo add-apt-repository ppa:ubuntu-toolchain-r/test -y && \
sudo apt-get update && \
sudo apt-get install libboost-all-dev g++-7 -y

Install the module

pip install -v git+https://github.com/artbataev/end2end.git

or

git clone --recursive https://github.com/artbataev/end2end.git
cd end2end
python setup.py install
python -m tests.test_ctc
python -m tests.test_ctc_decoder

How to use

CTC Loss

import torch
from pytorch_end2end import CTCLoss

ctc_loss = CTCLoss(blank_idx=0, time_major=False, 
    reduce=True, size_average=True, after_logsoftmax=False)

batch_size = 4
alphabet_size = 28 # blank + 26 english characters + space

logits = torch.randn(batch_size, 50, alphabet_size).detach().requires_grad_()
targets = torch.randint(1, alphabet_size, (batch_size, 30), dtype=torch.long)
logits_lengths = torch.full((batch_size,), 50, dtype=torch.long)
targets_lengths = torch.randint(10, 30, (batch_size,), dtype=torch.long)

loss = ctc_loss(logits, targets, logits_lengths, targets_lengths)
loss.backward()

CTC Decoder

import torch
from pytorch_end2end import CTCDecoder

batch_size = 4
alphabet_size = 6
decoder = CTCDecoder(blank_idx=0, beam_width=100, 
                     time_major=False, after_logsoftmax=False,
                     labels=["_", "a", "b", "c", "d", " "])

logits = torch.randn(batch_size, 50, alphabet_size).detach()
logits_lengths = torch.full((batch_size,), 50, dtype=torch.long)

decoded_targets, decoded_targets_lengths, decoded_sentences = decoder.decode(logits, logits_lengths)
for sentence in decoded_sentences:
    print(sentence)

Future Plans

Losses

Decoders

Restrict Beam Search with vocabulary
Allow custom transcriptions
Gram-CTC Beam Search Decoder

Name		Name	Last commit message	Last commit date
Latest commit History 181 Commits
docs		docs
pytorch_end2end		pytorch_end2end
src		src
tests		tests
third_party		third_party
.clang-format		.clang-format
.flake8		.flake8
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
CMakeLists.txt		CMakeLists.txt
CPPLINT.cfg		CPPLINT.cfg
LICENSE		LICENSE
README.md		README.md
dev_requirements.txt		dev_requirements.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Losses and decoders for end-to-end Speech Recognition and Optical Character Recognition with PyTorch

Documentation

Losses

Decoders

How to install

How to use

CTC Loss

CTC Decoder

Future Plans

Losses

Decoders

About

Releases

Packages

Contributors 2

Languages

License

artbataev/end2end

Folders and files

Latest commit

History

Repository files navigation

Losses and decoders for end-to-end Speech Recognition and Optical Character Recognition with PyTorch

Documentation

Losses

Decoders

How to install

How to use

CTC Loss

CTC Decoder

Future Plans

Losses

Decoders

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages