Skip to content

Commit

Permalink
Add non-streaming Zipformer recipe for KsponSpeech (#1664)
Browse files Browse the repository at this point in the history
  • Loading branch information
whsqkaak authored Jun 24, 2024
1 parent 3059eb4 commit 6f102d3
Show file tree
Hide file tree
Showing 36 changed files with 4,212 additions and 4 deletions.
3 changes: 2 additions & 1 deletion egs/ksponspeech/ASR/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ There are various folders containing the name `transducer` in this folder. The f

| | Encoder | Decoder | Comment |
| ---------------------------------------- | -------------------- | ------------------ | ------------------------------------------------- |
| `pruned_transducer_stateless7_streaming` | Streaming Zipformer | Embedding + Conv1d | streaming version of pruned_transducer_stateless7 |
| `pruned_transducer_stateless7_streaming` | Streaming Zipformer | Embedding + Conv1d | streaming version of pruned_transducer_stateless7 |
| `zipformer` | Upgraded Zipformer | Embedding + Conv1d | The latest recipe |

The decoder in `transducer_stateless` is modified from the paper [Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419/). We place an additional Conv1d layer right after the input embedding layer.
52 changes: 49 additions & 3 deletions egs/ksponspeech/ASR/RESULTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@ The CERs are:
| fast beam search | 320ms | 10.21 | 11.04 | --epoch 30 --avg 9 | simulated streaming |
| fast beam search | 320ms | 10.25 | 11.08 | --epoch 30 --avg 9 | chunk-wise |
| modified beam search | 320ms | 10.13 | 10.88 | --epoch 30 --avg 9 | simulated streaming |
| modified beam search | 320ms | 10.1 | 10.93 | --epoch 30 --avg 9 | chunk-size |
| modified beam search | 320ms | 10.1 | 10.93 | --epoch 30 --avg 9 | chunk-wize |
| greedy search | 640ms | 9.94 | 10.82 | --epoch 30 --avg 9 | simulated streaming |
| greedy search | 640ms | 10.04 | 10.85 | --epoch 30 --avg 9 | chunk-wise |
| fast beam search | 640ms | 10.01 | 10.81 | --epoch 30 --avg 9 | simulated streaming |
| fast beam search | 640ms | 10.04 | 10.7 | --epoch 30 --avg 9 | chunk-wise |
| modified beam search | 640ms | 9.91 | 10.72 | --epoch 30 --avg 9 | simulated streaming |
| modified beam search | 640ms | 9.92 | 10.72 | --epoch 30 --avg 9 | chunk-size |
| modified beam search | 640ms | 9.92 | 10.72 | --epoch 30 --avg 9 | chunk-wize |

Note: `simulated streaming` indicates feeding full utterance during decoding using `decode.py`,
while `chunk-size` indicates feeding certain number of frames at each time using `streaming_decode.py`.
Expand Down Expand Up @@ -67,4 +67,50 @@ for m in greedy_search modified_beam_search fast_beam_search; do
--decode-chunk-len 32 \
--num-decode-streams 2000
done
```
```

### zipformer (Zipformer + pruned statelss transducer)

#### [zipformer](./zipformer)

Number of model parameters: 74,778,511, i.e., 74.78 M

##### Training on KsponSpeech (with MUSAN)

Model: [johnBamma/icefall-asr-ksponspeech-zipformer-2024-06-24](https://huggingface.co/johnBamma/icefall-asr-ksponspeech-zipformer-2024-06-24)

The CERs are:

| decoding method | eval_clean | eval_other | comment |
|----------------------|------------|------------|---------------------|
| greedy search | 10.60 | 11.56 | --epoch 30 --avg 9 |
| fast beam search | 10.59 | 11.54 | --epoch 30 --avg 9 |
| modified beam search | 10.35 | 11.35 | --epoch 30 --avg 9 |

The training command is:

```bash
./zipformer/train.py \
--world-size 4 \
--num-epochs 30 \
--start-epoch 1 \
--use-fp16 1 \
--exp-dir zipformer/exp \
--max-duration 750 \
--enable-musan True \
--base-lr 0.035
```

NOTICE: I decreased `base_lr` from 0.045(default) to 0.035, Because of `RuntimeError: grad_scale is too small`.

The decoding command is:

```bash
for m in greedy_search fast_beam_search modified_beam_search; do
./zipformer/decode.py \
--epoch 30 \
--avg 9 \
--exp-dir zipformer/exp \
--decoding-method $m
done
```
1 change: 1 addition & 0 deletions egs/ksponspeech/ASR/zipformer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This recipe implements Zipformer model.
1 change: 1 addition & 0 deletions egs/ksponspeech/ASR/zipformer/asr_datamodule.py
1 change: 1 addition & 0 deletions egs/ksponspeech/ASR/zipformer/beam_search.py
Loading

0 comments on commit 6f102d3

Please sign in to comment.