Skip to content

Commit

Permalink
update_231
Browse files Browse the repository at this point in the history
update_231

update_ms231
  • Loading branch information
LiTingyu1997 committed Nov 6, 2024
1 parent 4c7b609 commit cc7db33
Show file tree
Hide file tree
Showing 6 changed files with 70 additions and 68 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,11 @@ MindAudio is a toolbox of audio models and algorithms based on [MindSpore](https

The following is the corresponding `mindaudio` versions and supported `mindspore` versions.

| `mindspore` | `mindaudio` |
|--------------|-------------|
| `master` | `master` |
| `2.3.0` | `0.4` |
| `2.2.10` | `0.3` |
| `mindaudio` | `mindspore` |
|-------------|---------------------|
| `master` | `master` |
| `0.4` | `2.3.0`/`2.3.1` |
| `0.3` | `2.2.10` |

### data processing

Expand Down
10 changes: 5 additions & 5 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@ MindAudio 是基于 [MindSpore](https://www.mindspore.cn/) 的音频模型和算

下表显示了相应的 `mindaudio` 版本和支持的 `mindspore` 版本。

| `mindspore` | `mindaudio` |
|--------------|-------------|
| `master` | `master` |
| `2.3.0` | `0.4` |
| `2.2.10` | `0.3` |
| `mindaudio` | `mindspore` |
|-------------|---------------------|
| `master` | `master` |
| `0.4` | `2.3.0`/`2.3.1` |
| `0.3` | `2.2.10` |

### 数据处理

Expand Down
42 changes: 16 additions & 26 deletions examples/conformer/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ The overall structure of Conformer includes SpecAug, ConvolutionSubsampling, Lin

![image-20230310165349460](https://raw.githubusercontent.com/mindspore-lab/mindaudio/main/tests/result/conformer.png)

## Requirements
| mindspore | ascend driver | firmware | cann toolkit/kernel |
|:-------------:|:----------------------:|:------------:|:-----------------------:|
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.bata1 |


## Usage Steps
Expand Down Expand Up @@ -103,35 +107,21 @@ python predict.py --config_path ./conformer.yaml
# using ctc prefix beam search decoder
python predict.py --config_path ./conformer.yaml --decode_mode ctc_prefix_beam_search

# using attention decoder
python predict.py --config_path ./conformer.yaml --decode_mode attention

# using attention rescoring decoder
python predict.py --config_path ./conformer.yaml --decode_mode attention_rescoring
```



## Model Performance
The training config can be found in the [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml).

Performance tested on ascend 910 (8p) with graph mode:

| model | decoding mode | CER |
|-----------|------------------------|--------------|
| conformer | ctc greedy search | 5.35 |
| conformer | ctc prefix beam search | 5.36 |
| conformer | attention decoder | comming soon |
| conformer | attention rescoring | 4.95 |
- [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-548ee31b.ckpt) can be downloaded here.

---
Performance tested on ascend 910* (8p) with graph mode:

| model | decoding mode | CER |
|-----------|------------------------|--------------|
| conformer | ctc greedy search | 5.62 |
| conformer | ctc prefix beam search | 5.62 |
| conformer | attention decoder | comming soon |
| conformer | attention rescoring | 5.12 |
- [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) can be downloaded here.

Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode:

| model name| cars | batch type | jit level | s/step | recipe | weight | decoding mode | cer |
|:---------:|:----:|:----------:|:---------:|:------:|:------:|:------:|:---------------------:|:----:|
| conformer | 8 | bucket | O0 | 0.72 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |ctc greedy search | 5.62 |
| conformer | 8 | bucket | O0 | 0.72 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |ctc prefix beam search | 5.62 |
| conformer | 8 | bucket | O0 | 0.72 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |attention rescoring | 5.12 |
<<<<<<< HEAD

=======
>>>>>>> 1d72af4 (update_231)
41 changes: 17 additions & 24 deletions examples/conformer/readme_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ Conformer整体结构包括:SpecAug、ConvolutionSubsampling、Linear、Dropou

![image-20230310165349460](https://raw.githubusercontent.com/mindspore-lab/mindaudio/main/tests/result/conformer.png)

## 版本要求
| mindspore | ascend driver | firmware | cann toolkit/kernel |
|:-------------:|:----------------------:|:------------:|:-----------------------:|
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.bata1 |


## 使用步骤

Expand Down Expand Up @@ -102,32 +107,20 @@ python predict.py --config_path ./conformer.yaml
# using ctc prefix beam search decoder
python predict.py --config_path ./conformer.yaml --decode_mode ctc_prefix_beam_search

# using attention decoder
python predict.py --config_path ./conformer.yaml --decode_mode attention

# using attention rescoring decoder
python predict.py --config_path ./conformer.yaml --decode_mode attention_rescoring
```

## **模型表现**
训练的配置文件见 [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml)

在 ascend 910(8p) 图模式上的测试性能:

| model | decoding mode | CER |
| --------- | ---------------------- |--------------|
| conformer | ctc greedy search | 5.35 |
| conformer | ctc prefix beam search | 5.36 |
| conformer | attention decoder | comming soon |
| conformer | attention rescoring | 4.95 |
- 训练好的 [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-548ee31b.ckpt) 可以在此处下载。
---
在 ascend 910*(8p) 图模式上的测试性能:

| model | decoding mode | CER |
| --------- | ---------------------- |--------------|
| conformer | ctc greedy search | 5.62 |
| conformer | ctc prefix beam search | 5.62 |
| conformer | attention decoder | comming soon |
| conformer | attention rescoring | 5.12 |
- 训练好的 [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) 可以在此处下载。

在 ascend 910* mindspore2.3.1图模式上的测试性能:

| model name| cars | batch type | jit level | s/step | recipe | weight | decoding mode | cer |
|:---------:|:----:|:----------:|:---------:|:------:|:------:|:------:|:---------------------:|:----:|
| conformer | 8 | bucket | O0 | 0.72 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |ctc greedy search | 5.62 |
| conformer | 8 | bucket | O0 | 0.72 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |ctc prefix beam search | 5.62 |
| conformer | 8 | bucket | O0 | 0.72 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |attention rescoring | 5.12 |
<<<<<<< HEAD

=======
>>>>>>> 1d72af4 (update_231)
20 changes: 16 additions & 4 deletions examples/deepspeech2/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,13 @@
## Introduction

DeepSpeech2 is a speech recognition model trained using CTC loss. It replaces the entire manually designed component pipeline with neural networks and can handle a variety of speech, including noisy environments, accents, and different languages. The currently provided version supports using the [DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf) model for training/testing and inference on the librispeech dataset on NPU and GPU.
DeepSpeech2 is a speech recognition model trained using CTC loss. It replaces the entire manually designed component pipeline with neural networks and can handle a variety of speech, including noisy environments, accents, and different languages. The currently provided version supports using the [DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf) model for training/testing and inference on the librispeech dataset on NPU.


### Requirements
| mindspore | ascend driver | firmware | cann toolkit/kernel |
|:-------------:|:----------------------:|:------------:|:-----------------------:|
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.bata1 |

### Model Architecture

Expand Down Expand Up @@ -96,6 +102,12 @@ python eval.py -c "./deepspeech2.yaml"

## **Model Performance**

| Model | Machine | LM | Test Clean CER | Test Clean WER | Parameters | Weights |
|--------------|-----------|------|----------------|----------------|----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------|
| DeepSpeech2 | D910x8-G | No | 3.461 | 10.24 | [yaml](https://github.com/mindsporelab/mindaudio/blob/main/example/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt) |
Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode:

<<<<<<< HEAD
| model name | cards | batch size | jit level | s/step | recipe | weight | test clean cer | test clean wer |
=======
| model name | cards | batch size | jit level | s/step | recipe | weight | test clean cer | test clean wer |
>>>>>>> 1d72af4 (update_231)
|:----------:|:-----:|:----------:|:---------:|:------:|:------:|:------:|:--------------:|:--------------:|
| deepspeech2| 8 | 64 | O0 | 2.82 | [yaml](https://github.com/mindsporelab/mindaudio/blob/main/example/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt)| 3.461 | 10.24 |
15 changes: 11 additions & 4 deletions examples/deepspeech2/readme_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,13 @@

## 介绍

DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网络取代了整个手工设计组件的管道,可以处理各种各样的语音,包括嘈杂的环境、口音和不同的语言。目前提供版本支持在NPU和GPU上使用[DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf)模型在librispeech数据集上进行训练/测试和推理。
DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网络取代了整个手工设计组件的管道,可以处理各种各样的语音,包括嘈杂的环境、口音和不同的语言。目前提供版本支持在NPU上使用[DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf)模型在librispeech数据集上进行训练/测试和推理。


### 版本要求
| mindspore | ascend driver | firmware | cann toolkit/kernel |
|:-------------:|:----------------------:|:------------:|:-----------------------:|
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.bata1 |

### 模型结构

Expand All @@ -16,6 +22,7 @@ DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网
- 五个双向 LSTM 层(大小为 1024)
- 一个投影层【大小为字符数加 1(为CTC空白符号),28】


### 数据处理

- 音频:
Expand Down Expand Up @@ -104,6 +111,6 @@ python eval.py -c "./deepspeech2.yaml"

## **性能表现**

| model | LM | test clean cer| test clean wer | config | weights|
| ----------- | ---- | -------------- | -------------- |--------------------------------------------------------------------------------------------------| ------------------------------------------------------------ |
| deepspeech2 | No | 3.461 | 10.24 | [yaml](https://github.com/mindsporelab/mindaudio/blob/main/example/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt) |
| model name | cards | batch size | jit level | s/step | recipe | weight | test clean cer | test clean wer |
|:----------:|:-----:|:----------:|:---------:|:------:|:------:|:------:|:--------------:|:--------------:|
| deepspeech2| 8 | 64 | O0 | 2.82 | [yaml](https://github.com/mindsporelab/mindaudio/blob/main/example/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt)| 3.461 | 10.24 |

0 comments on commit cc7db33

Please sign in to comment.