From 2a680a72c2c31b44b6dcbb8ddd972bc8ba0ecae1 Mon Sep 17 00:00:00 2001 From: litingyu <605979840@qq.com> Date: Tue, 5 Nov 2024 21:37:23 +0800 Subject: [PATCH] update_231 update_231 update_ms231 update_ms231 --- README.md | 10 ++++---- README_CN.md | 10 ++++---- examples/conformer/readme.md | 42 ++++++++++++------------------- examples/conformer/readme_cn.md | 41 +++++++++++++----------------- examples/deepspeech2/readme.md | 20 ++++++++++++--- examples/deepspeech2/readme_cn.md | 15 ++++++++--- 6 files changed, 70 insertions(+), 68 deletions(-) diff --git a/README.md b/README.md index 113e777..5a45083 100644 --- a/README.md +++ b/README.md @@ -24,11 +24,11 @@ MindAudio is a toolbox of audio models and algorithms based on [MindSpore](https The following is the corresponding `mindaudio` versions and supported `mindspore` versions. -| `mindspore` | `mindaudio` | -|--------------|-------------| -| `master` | `master` | -| `2.3.0` | `0.4` | -| `2.2.10` | `0.3` | +| `mindaudio` | `mindspore` | +|-------------|---------------------| +| `master` | `master` | +| `0.4` | `2.3.0`/`2.3.1` | +| `0.3` | `2.2.10` | ### data processing diff --git a/README_CN.md b/README_CN.md index c87a6f3..68b8c00 100644 --- a/README_CN.md +++ b/README_CN.md @@ -22,11 +22,11 @@ MindAudio 是基于 [MindSpore](https://www.mindspore.cn/) 的音频模型和算 下表显示了相应的 `mindaudio` 版本和支持的 `mindspore` 版本。 -| `mindspore` | `mindaudio` | -|--------------|-------------| -| `master` | `master` | -| `2.3.0` | `0.4` | -| `2.2.10` | `0.3` | +| `mindaudio` | `mindspore` | +|-------------|---------------------| +| `master` | `master` | +| `0.4` | `2.3.0`/`2.3.1` | +| `0.3` | `2.2.10` | ### 数据处理 diff --git a/examples/conformer/readme.md b/examples/conformer/readme.md index f4a75b3..4d35c0a 100644 --- a/examples/conformer/readme.md +++ b/examples/conformer/readme.md @@ -16,6 +16,10 @@ The overall structure of Conformer includes SpecAug, ConvolutionSubsampling, Lin ![image-20230310165349460](https://raw.githubusercontent.com/mindspore-lab/mindaudio/main/tests/result/conformer.png) +## Requirements +| mindspore | ascend driver | firmware | cann toolkit/kernel | +|:-------------:|:----------------------:|:------------:|:-----------------------:| +| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.bata1 | ## Usage Steps @@ -103,35 +107,21 @@ python predict.py --config_path ./conformer.yaml # using ctc prefix beam search decoder python predict.py --config_path ./conformer.yaml --decode_mode ctc_prefix_beam_search -# using attention decoder -python predict.py --config_path ./conformer.yaml --decode_mode attention - # using attention rescoring decoder python predict.py --config_path ./conformer.yaml --decode_mode attention_rescoring ``` - ## Model Performance -The training config can be found in the [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml). - -Performance tested on ascend 910 (8p) with graph mode: - -| model | decoding mode | CER | -|-----------|------------------------|--------------| -| conformer | ctc greedy search | 5.35 | -| conformer | ctc prefix beam search | 5.36 | -| conformer | attention decoder | comming soon | -| conformer | attention rescoring | 4.95 | -- [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-548ee31b.ckpt) can be downloaded here. - ---- -Performance tested on ascend 910* (8p) with graph mode: - -| model | decoding mode | CER | -|-----------|------------------------|--------------| -| conformer | ctc greedy search | 5.62 | -| conformer | ctc prefix beam search | 5.62 | -| conformer | attention decoder | comming soon | -| conformer | attention rescoring | 5.12 | -- [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) can be downloaded here. + +Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode: + +| model name| cars | batch type | jit level | s/step | recipe | weight | decoding mode | cer | +|:---------:|:----:|:----------:|:---------:|:------:|:------:|:------:|:---------------------:|:----:| +| conformer | 8 | bucket | O0 | 0.72 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |ctc greedy search | 5.62 | +| conformer | 8 | bucket | O0 | 0.72 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |ctc prefix beam search | 5.62 | +| conformer | 8 | bucket | O0 | 0.72 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |attention rescoring | 5.12 | +<<<<<<< HEAD + +======= +>>>>>>> 1d72af4 (update_231) diff --git a/examples/conformer/readme_cn.md b/examples/conformer/readme_cn.md index e198491..648e4dc 100644 --- a/examples/conformer/readme_cn.md +++ b/examples/conformer/readme_cn.md @@ -16,6 +16,11 @@ Conformer整体结构包括:SpecAug、ConvolutionSubsampling、Linear、Dropou ![image-20230310165349460](https://raw.githubusercontent.com/mindspore-lab/mindaudio/main/tests/result/conformer.png) +## 版本要求 +| mindspore | ascend driver | firmware | cann toolkit/kernel | +|:-------------:|:----------------------:|:------------:|:-----------------------:| +| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.bata1 | + ## 使用步骤 @@ -102,32 +107,20 @@ python predict.py --config_path ./conformer.yaml # using ctc prefix beam search decoder python predict.py --config_path ./conformer.yaml --decode_mode ctc_prefix_beam_search -# using attention decoder -python predict.py --config_path ./conformer.yaml --decode_mode attention - # using attention rescoring decoder python predict.py --config_path ./conformer.yaml --decode_mode attention_rescoring ``` ## **模型表现** -训练的配置文件见 [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml)。 - -在 ascend 910(8p) 图模式上的测试性能: - -| model | decoding mode | CER | -| --------- | ---------------------- |--------------| -| conformer | ctc greedy search | 5.35 | -| conformer | ctc prefix beam search | 5.36 | -| conformer | attention decoder | comming soon | -| conformer | attention rescoring | 4.95 | -- 训练好的 [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-548ee31b.ckpt) 可以在此处下载。 ---- -在 ascend 910*(8p) 图模式上的测试性能: - -| model | decoding mode | CER | -| --------- | ---------------------- |--------------| -| conformer | ctc greedy search | 5.62 | -| conformer | ctc prefix beam search | 5.62 | -| conformer | attention decoder | comming soon | -| conformer | attention rescoring | 5.12 | -- 训练好的 [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) 可以在此处下载。 + +在 ascend 910* mindspore2.3.1图模式上的测试性能: + +| model name| cars | batch type | jit level | s/step | recipe | weight | decoding mode | cer | +|:---------:|:----:|:----------:|:---------:|:------:|:------:|:------:|:---------------------:|:----:| +| conformer | 8 | bucket | O0 | 0.72 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |ctc greedy search | 5.62 | +| conformer | 8 | bucket | O0 | 0.72 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |ctc prefix beam search | 5.62 | +| conformer | 8 | bucket | O0 | 0.72 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |attention rescoring | 5.12 | +<<<<<<< HEAD + +======= +>>>>>>> 1d72af4 (update_231) diff --git a/examples/deepspeech2/readme.md b/examples/deepspeech2/readme.md index ce92ad5..97bcdbc 100644 --- a/examples/deepspeech2/readme.md +++ b/examples/deepspeech2/readme.md @@ -3,7 +3,13 @@ ## Introduction -DeepSpeech2 is a speech recognition model trained using CTC loss. It replaces the entire manually designed component pipeline with neural networks and can handle a variety of speech, including noisy environments, accents, and different languages. The currently provided version supports using the [DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf) model for training/testing and inference on the librispeech dataset on NPU and GPU. +DeepSpeech2 is a speech recognition model trained using CTC loss. It replaces the entire manually designed component pipeline with neural networks and can handle a variety of speech, including noisy environments, accents, and different languages. The currently provided version supports using the [DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf) model for training/testing and inference on the librispeech dataset on NPU. + + +### Requirements +| mindspore | ascend driver | firmware | cann toolkit/kernel | +|:-------------:|:----------------------:|:------------:|:-----------------------:| +| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.bata1 | ### Model Architecture @@ -96,6 +102,12 @@ python eval.py -c "./deepspeech2.yaml" ## **Model Performance** -| Model | Machine | LM | Test Clean CER | Test Clean WER | Parameters | Weights | -|--------------|-----------|------|----------------|----------------|----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------| -| DeepSpeech2 | D910x8-G | No | 3.461 | 10.24 | [yaml](https://github.com/mindsporelab/mindaudio/blob/main/example/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt) | +Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode: + +<<<<<<< HEAD +| model name | cards | batch size | jit level | s/step | recipe | weight | test clean cer | test clean wer | +======= +| model name | cards | batch size | jit level | s/step | recipe | weight | test clean cer | test clean wer | +>>>>>>> 1d72af4 (update_231) +|:----------:|:-----:|:----------:|:---------:|:------:|:------:|:------:|:--------------:|:--------------:| +| deepspeech2| 8 | 64 | O0 | 2.82 | [yaml](https://github.com/mindsporelab/mindaudio/blob/main/example/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt)| 3.461 | 10.24 | diff --git a/examples/deepspeech2/readme_cn.md b/examples/deepspeech2/readme_cn.md index 0b0b871..535a1a7 100644 --- a/examples/deepspeech2/readme_cn.md +++ b/examples/deepspeech2/readme_cn.md @@ -4,7 +4,13 @@ ## 介绍 -DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网络取代了整个手工设计组件的管道,可以处理各种各样的语音,包括嘈杂的环境、口音和不同的语言。目前提供版本支持在NPU和GPU上使用[DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf)模型在librispeech数据集上进行训练/测试和推理。 +DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网络取代了整个手工设计组件的管道,可以处理各种各样的语音,包括嘈杂的环境、口音和不同的语言。目前提供版本支持在NPU上使用[DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf)模型在librispeech数据集上进行训练/测试和推理。 + + +### 版本要求 +| mindspore | ascend driver | firmware | cann toolkit/kernel | +|:-------------:|:----------------------:|:------------:|:-----------------------:| +| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.bata1 | ### 模型结构 @@ -16,6 +22,7 @@ DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网 - 五个双向 LSTM 层(大小为 1024) - 一个投影层【大小为字符数加 1(为CTC空白符号),28】 + ### 数据处理 - 音频: @@ -104,6 +111,6 @@ python eval.py -c "./deepspeech2.yaml" ## **性能表现** -| model | LM | test clean cer| test clean wer | config | weights| -| ----------- | ---- | -------------- | -------------- |--------------------------------------------------------------------------------------------------| ------------------------------------------------------------ | -| deepspeech2 | No | 3.461 | 10.24 | [yaml](https://github.com/mindsporelab/mindaudio/blob/main/example/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt) | +| model name | cards | batch size | jit level | s/step | recipe | weight | test clean cer | test clean wer | +|:----------:|:-----:|:----------:|:---------:|:------:|:------:|:------:|:--------------:|:--------------:| +| deepspeech2| 8 | 64 | O0 | 2.82 | [yaml](https://github.com/mindsporelab/mindaudio/blob/main/example/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt)| 3.461 | 10.24 |