Skip to content

Commit

Permalink
[recipe] LibriSpeech zipformer_ctc (#941)
Browse files Browse the repository at this point in the history
* merge upstream

* initial commit for zipformer_ctc

* remove unwanted changes

* remove changes to other recipe

* fix zipformer softlink

* fix for JIT export

* add missing file

* fix symbolic links

* update results

* Update RESULTS.md

Address comments from @csukuangfj

---------

Co-authored-by: zr_jin <[email protected]>
  • Loading branch information
desh2608 and JinZr authored Oct 27, 2023
1 parent 5cebecf commit 7d56685
Show file tree
Hide file tree
Showing 17 changed files with 2,777 additions and 1 deletion.
1 change: 1 addition & 0 deletions egs/librispeech/ASR/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ We place an additional Conv1d layer right after the input embedding layer.
| `conformer-ctc` | Conformer | Use auxiliary attention head |
| `conformer-ctc2` | Reworked Conformer | Use auxiliary attention head |
| `conformer-ctc3` | Reworked Conformer | Streaming version + delay penalty |
| `zipformer-ctc` | Zipformer | Use auxiliary attention head |
| `zipformer` | Upgraded Zipformer | Use auxiliary transducer head | The latest recipe |

# MMI
Expand Down
51 changes: 50 additions & 1 deletion egs/librispeech/ASR/RESULTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -375,6 +375,55 @@ for m in greedy_search modified_beam_search fast_beam_search; do
done
```

### Zipformer CTC

#### [zipformer_ctc](./zipformer_ctc)

See <https://github.com/k2-fsa/icefall/pull/941> for more details.

You can find a pretrained model, training logs, decoding logs, and decoding
results at:
<https://huggingface.co/desh2608/icefall-asr-librispeech-zipformer-ctc>

Number of model parameters: 86083707, i.e., 86.08 M

| decoding method | test-clean | test-other | comment |
|-------------------------|------------|------------|---------------------|
| ctc-decoding | 2.50 | 5.86 | --epoch 30 --avg 9 |
| whole-lattice-rescoring | 2.44 | 5.38 | --epoch 30 --avg 9 |
| attention-rescoring | 2.35 | 5.16 | --epoch 30 --avg 9 |
| 1best | 2.01 | 4.61 | --epoch 30 --avg 9 |

The training commands are:
```bash

export CUDA_VISIBLE_DEVICES="0,1,2,3"

./zipformer_ctc/train.py \
--world-size 4 \
--num-epochs 30 \
--start-epoch 1 \
--use-fp16 1 \
--exp-dir zipformer_ctc/exp \
--full-libri 1 \
--max-duration 1000 \
--master-port 12345
```

The tensorboard log can be found at:
<https://tensorboard.dev/experiment/IjPSJjHOQFKPYA5Z0Vf8wg>

The decoding command is:

```bash
./zipformer_ctc/decode.py \
--epoch 30 --avg 9 --use-averaged-model True \
--exp-dir zipformer_ctc/exp \
--lang-dir data/lang_bpe_500 \
--lm-dir data/lm \
--method ctc-decoding
```

### pruned_transducer_stateless7 (Fine-tune with mux)

See <https://github.com/k2-fsa/icefall/pull/1059> for more details.
Expand Down Expand Up @@ -616,7 +665,6 @@ for m in greedy_search modified_beam_search fast_beam_search; do
done
```


#### Smaller model

We also provide a very small version (only 6.1M parameters) of this setup. The training command for the small model is:
Expand Down Expand Up @@ -663,6 +711,7 @@ This small model achieves the following WERs on GigaSpeech test and dev sets:

You can find the tensorboard logs at <https://tensorboard.dev/experiment/tAc5iXxTQrCQxky5O5OLyw/#scalars>.


### Streaming Zipformer-Transducer (Pruned Stateless Transducer + Streaming Zipformer)

#### [pruned_transducer_stateless7_streaming](./pruned_transducer_stateless7_streaming)
Expand Down
Empty file.
1 change: 1 addition & 0 deletions egs/librispeech/ASR/zipformer_ctc/asr_datamodule.py
Loading

0 comments on commit 7d56685

Please sign in to comment.