diff --git a/docs/source/_static/matcha-icefall-baker-zh/matcha-baker-0.wav b/docs/source/_static/matcha-icefall-baker-zh/matcha-baker-0.wav new file mode 100644 index 000000000..2c8be048b Binary files /dev/null and b/docs/source/_static/matcha-icefall-baker-zh/matcha-baker-0.wav differ diff --git a/docs/source/_static/matcha-icefall-baker-zh/matcha-baker-1.wav b/docs/source/_static/matcha-icefall-baker-zh/matcha-baker-1.wav new file mode 100644 index 000000000..e27d15383 Binary files /dev/null and b/docs/source/_static/matcha-icefall-baker-zh/matcha-baker-1.wav differ diff --git a/docs/source/_static/matcha-icefall-baker-zh/matcha-baker-2.wav b/docs/source/_static/matcha-icefall-baker-zh/matcha-baker-2.wav new file mode 100644 index 000000000..50c9a9bec Binary files /dev/null and b/docs/source/_static/matcha-icefall-baker-zh/matcha-baker-2.wav differ diff --git a/docs/source/_static/matcha-icefall-en_US-ljspeech/matcha-ljspeech-0.wav b/docs/source/_static/matcha-icefall-en_US-ljspeech/matcha-ljspeech-0.wav new file mode 100644 index 000000000..d6726715b Binary files /dev/null and b/docs/source/_static/matcha-icefall-en_US-ljspeech/matcha-ljspeech-0.wav differ diff --git a/docs/source/_static/matcha-icefall-en_US-ljspeech/matcha-ljspeech-1.wav b/docs/source/_static/matcha-icefall-en_US-ljspeech/matcha-ljspeech-1.wav new file mode 100644 index 000000000..76c3f7b2b Binary files /dev/null and b/docs/source/_static/matcha-icefall-en_US-ljspeech/matcha-ljspeech-1.wav differ diff --git a/docs/source/onnx/tts/pretrained_models/index.rst b/docs/source/onnx/tts/pretrained_models/index.rst index 0d519060b..d8c21ae67 100644 --- a/docs/source/onnx/tts/pretrained_models/index.rst +++ b/docs/source/onnx/tts/pretrained_models/index.rst @@ -14,4 +14,5 @@ This page list pre-trained models for text-to-speech. .. toctree:: :maxdepth: 5 + ./matcha ./vits diff --git a/docs/source/onnx/tts/pretrained_models/matcha.rst b/docs/source/onnx/tts/pretrained_models/matcha.rst new file mode 100644 index 000000000..b2559c9c9 --- /dev/null +++ b/docs/source/onnx/tts/pretrained_models/matcha.rst @@ -0,0 +1,370 @@ +Matcha +====== + + +This page lists pre-trained models using `Matcha-TTS `_. + +.. caution:: + + Models are from `icefall `_. + + We don't support models from ``_. + +matcha-icefall-en_US-ljspeech (American English, 1 female speaker) +------------------------------------------------------------------ + +This model is trained using + + ``_ + +The dataset used to train the model is from + + ``_. + +In the following, we describe how to download it and use it with `sherpa-onnx`_. + +Download the model +~~~~~~~~~~~~~~~~~~ + +Please use the following commands to download it. + +.. code-block:: bash + + cd /path/to/sherpa-onnx + + wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/matcha-icefall-en_US-ljspeech.tar.bz2 + tar xvf matcha-icefall-en_US-ljspeech.tar.bz2 + rm matcha-icefall-en_US-ljspeech.tar.bz2 + + wget https://github.com/k2-fsa/sherpa-onnx/releases/download/vocoder-models/hifigan_v2.onnx + +.. caution:: + + Remember to also download the vocoder model. We use `hifigan_v2 `_ in the example. + You can also select `hifigan_v1 `_ or + `hifigan_v3 `_. + +Please check that the file sizes of the pre-trained models are correct. See +the file sizes of ``*.onnx`` files below. + +.. code-block:: bash + + ls -lh matcha-icefall-en_US-ljspeech/ + total 144856 + -rw-r--r-- 1 fangjun staff 251B Jan 2 11:05 README.md + drwxr-xr-x 122 fangjun staff 3.8K Nov 28 2023 espeak-ng-data + -rw-r--r--@ 1 fangjun staff 71M Jan 2 04:04 model-steps-3.onnx + -rw-r--r-- 1 fangjun staff 954B Jan 2 11:05 tokens.txt + + ls -lh hifigan_v2.onnx + -rw-r--r-- 1 fangjun staff 3.6M Dec 30 17:10 hifigan_v2.onnx + +Generate speech with executables compiled from C++ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: bash + + cd /path/to/sherpa-onnx + + ./build/bin/sherpa-onnx-offline-tts \ + --matcha-acoustic-model=./matcha-icefall-en_US-ljspeech/model-steps-3.onnx \ + --matcha-vocoder=./hifigan_v2.onnx \ + --matcha-tokens=./matcha-icefall-en_US-ljspeech/tokens.txt \ + --matcha-data-dir=./matcha-icefall-en_US-ljspeech/espeak-ng-data \ + --num-threads=2 \ + --output-filename=./matcha-ljspeech-0.wav \ + --debug=1 \ + "Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be: a statesman, a businessman, an official, or a scholar." + +After running, it will generate a file ``matcha-ljspeech-0.wav`` in the +current directory. + +.. code-block:: bash + + soxi ./matcha-ljspeech-0.wav + + Input File : './matcha-ljspeech-0.wav' + Channels : 1 + Sample Rate : 22050 + Precision : 16-bit + Duration : 00:00:15.06 = 332032 samples ~ 1129.36 CDDA sectors + File Size : 664k + Bit Rate : 353k + Sample Encoding: 16-bit Signed Integer PCM + +.. raw:: html + + + + + + + + + + + + +
Wave filenameContentText
matcha-ljspeech-0.wav + + + "Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be: a statesman, a businessman, an official, or a scholar." +
+ +Generate speech with Python script +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: bash + + cd /path/to/sherpa-onnx + + python3 ./python-api-examples/offline-tts.py \ + --matcha-acoustic-model=./matcha-icefall-en_US-ljspeech/model-steps-3.onnx \ + --matcha-vocoder=./hifigan_v2.onnx \ + --matcha-tokens=./matcha-icefall-en_US-ljspeech/tokens.txt \ + --matcha-data-dir=./matcha-icefall-en_US-ljspeech/espeak-ng-data \ + --num-threads=2 \ + --output-filename=./matcha-ljspeech-1.wav \ + --debug=1 \ + "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." + +.. code-block:: + + soxi ./matcha-ljspeech-1.wav + + Input File : './matcha-ljspeech-1.wav' + Channels : 1 + Sample Rate : 22050 + Precision : 16-bit + Duration : 00:00:07.92 = 174592 samples ~ 593.85 CDDA sectors + File Size : 349k + Bit Rate : 353k + Sample Encoding: 16-bit Signed Integer PCM + +.. raw:: html + + + + + + + + + + + + +
Wave filenameContentText
matcha-ljspeech-1.wav + + + "Friends fell out often because life was changing so fast. The easiest thing in the world was to lose touch with someone." +
+ +matcha-icefall-zh-baker (Chinese, 1 female speaker) +--------------------------------------------------- + +This model is trained using + + ``_ + +The dataset used to train the model is from + + ``_. + +.. caution:: + + The dataset is for ``non-commercial`` use only. + +In the following, we describe how to download it and use it with `sherpa-onnx`_. + +Download the model +~~~~~~~~~~~~~~~~~~ + +Please use the following commands to download it. + +.. code-block:: bash + + cd /path/to/sherpa-onnx + + wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/matcha-icefall-zh-baker.tar.bz2 + tar xvf matcha-icefall-zh-baker.tar.bz2 + rm matcha-icefall-zh-baker.tar.bz2 + + wget https://github.com/k2-fsa/sherpa-onnx/releases/download/vocoder-models/hifigan_v2.onnx + +.. caution:: + + Remember to also download the vocoder model. We use `hifigan_v2 `_ in the example. + You can also select `hifigan_v1 `_ or + `hifigan_v3 `_. + +Please check that the file sizes of the pre-trained models are correct. See +the file sizes of ``*.onnx`` files below. + +.. code-block:: bash + + ls -lh matcha-icefall-zh-baker/ + total 167344 + -rw-r--r-- 1 fangjun staff 370B Dec 31 14:51 README.md + -rw-r--r-- 1 fangjun staff 58K Dec 31 14:51 date.fst + drwxr-xr-x 9 fangjun staff 288B Apr 19 2024 dict + -rw-r--r-- 1 fangjun staff 1.3M Dec 31 14:51 lexicon.txt + -rw-r--r-- 1 fangjun staff 72M Dec 31 14:51 model-steps-3.onnx + -rw-r--r-- 1 fangjun staff 63K Dec 31 14:51 number.fst + -rw-r--r-- 1 fangjun staff 87K Dec 31 14:51 phone.fst + -rw-r--r-- 1 fangjun staff 19K Dec 31 14:51 tokens.txt + + ls -lh hifigan_v2.onnx + -rw-r--r-- 1 fangjun staff 3.6M Dec 30 17:10 hifigan_v2.onnx + +Generate speech with executables compiled from C++ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: bash + + cd /path/to/sherpa-onnx + + ./build/bin/sherpa-onnx-offline-tts \ + --matcha-acoustic-model=./matcha-icefall-zh-baker/model-steps-3.onnx \ + --matcha-vocoder=./hifigan_v2.onnx \ + --matcha-lexicon=./matcha-icefall-zh-baker/lexicon.txt \ + --matcha-tokens=./matcha-icefall-zh-baker/tokens.txt \ + --matcha-dict-dir=./matcha-icefall-zh-baker/dict \ + --num-threads=2 \ + --output-filename=./matcha-baker-0.wav \ + --debug=1 \ + "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔." + + ./build/bin/sherpa-onnx-offline-tts \ + --matcha-acoustic-model=./matcha-icefall-zh-baker/model-steps-3.onnx \ + --matcha-vocoder=./hifigan_v2.onnx \ + --matcha-lexicon=./matcha-icefall-zh-baker/lexicon.txt \ + --matcha-tokens=./matcha-icefall-zh-baker/tokens.txt \ + --tts-rule-fsts=./matcha-icefall-zh-baker/phone.fst,./matcha-icefall-zh-baker/date.fst,./matcha-icefall-zh-baker/number.fst \ + --matcha-dict-dir=./matcha-icefall-zh-baker/dict \ + --output-filename=./matcha-baker-1.wav \ + "某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。2024年12月31号,拨打110或者18920240511。123456块钱。" + +After running, it will generate two files, ``matcha-baker-0.wav`` and +``matcha-baker-1.wav``, in the current directory. + +.. code-block:: bash + + soxi matcha-baker-*.wav + + Input File : 'matcha-baker-0.wav' + Channels : 1 + Sample Rate : 22050 + Precision : 16-bit + Duration : 00:00:22.65 = 499456 samples ~ 1698.83 CDDA sectors + File Size : 999k + Bit Rate : 353k + Sample Encoding: 16-bit Signed Integer PCM + + + Input File : 'matcha-baker-1.wav' + Channels : 1 + Sample Rate : 22050 + Precision : 16-bit + Duration : 00:00:22.65 = 499456 samples ~ 1698.83 CDDA sectors + File Size : 999k + Bit Rate : 353k + Sample Encoding: 16-bit Signed Integer PCM + + Total Duration of 2 files: 00:00:45.30 + +.. raw:: html + + + + + + + + + + + + + + + + + + +
Wave filenameContentText
matcha-baker-0.wav + + + "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔." +
matcha-baker-1.wav + + + "某某银行的副行长和一些行政领导表示,他们去过长江和长白山; 经济不断增长。2024年12月31号,拨打110或者18920240511。123456块钱。" +
+ +Generate speech with Python script +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: bash + + cd /path/to/sherpa-onnx + + python3 ./python-api-examples/offline-tts.py \ + --matcha-acoustic-model=./matcha-icefall-zh-baker/model-steps-3.onnx \ + --matcha-vocoder=./hifigan_v2.onnx \ + --matcha-lexicon=./matcha-icefall-zh-baker/lexicon.txt \ + --matcha-tokens=./matcha-icefall-zh-baker/tokens.txt \ + --tts-rule-fsts=./matcha-icefall-zh-baker/phone.fst,./matcha-icefall-zh-baker/date.fst,./matcha-icefall-zh-baker/number.fst \ + --matcha-dict-dir=./matcha-icefall-zh-baker/dict \ + --output-filename=./matcha-baker-2.wav \ + --debug=1 \ + "三百六十行,行行出状元。你行的!明天就是 2025年1月1号啦!银行卡被卡住了,你帮个忙,行不行?" + +After running, it will generate a file ``matcha-baker-zh-2.wav`` in the current directory. + +.. code-block:: bash + + soxi matcha-baker-2.wav + + Input File : 'matcha-baker-2.wav' + Channels : 1 + Sample Rate : 22050 + Precision : 16-bit + Duration : 00:00:12.71 = 280320 samples ~ 953.469 CDDA sectors + File Size : 561k + Bit Rate : 353k + Sample Encoding: 16-bit Signed Integer PCM + +.. raw:: html + + + + + + + + + + + + +
Wave filenameContentText
matcha-baker-2.wav + + + "三百六十行,行行出状元。你行的!明天就是 2025年1月1号啦!银行卡被卡住了,你帮个忙,行不行?" +
diff --git a/docs/source/onnx/tts/pretrained_models/vits.rst b/docs/source/onnx/tts/pretrained_models/vits.rst index 00d4435d9..ad8efda22 100644 --- a/docs/source/onnx/tts/pretrained_models/vits.rst +++ b/docs/source/onnx/tts/pretrained_models/vits.rst @@ -158,8 +158,8 @@ the file sizes of ``*.onnx`` files below. -rw-r--r-- 1 fangjun staff 87K Jul 16 13:38 phone.fst -rw-r--r-- 1 fangjun staff 655B Jul 16 13:38 tokens.txt -Generate speech with executable compiled from C++ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Generate speech with executables compiled from C++ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash @@ -370,8 +370,8 @@ the file sizes of ``*.onnx`` files below. drwxr-xr-x 122 fangjun staff 3.8K Dec 13 2023 espeak-ng-data -rw-r--r-- 1 fangjun staff 940B Dec 13 2023 tokens.txt -Generate speech with executable compiled from C++ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Generate speech with executables compiled from C++ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash @@ -603,8 +603,8 @@ the file sizes of ``*.onnx`` files below. -rwxr-xr-x 1 fangjun staff 1.8K Nov 29 2023 vits-piper-en_US.py -rwxr-xr-x 1 fangjun staff 730B Nov 29 2023 vits-piper-en_US.sh -Generate speech with executable compiled from C++ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Generate speech with executables compiled from C++ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash @@ -810,8 +810,8 @@ the file sizes of ``*.onnx`` files below. -rw-r--r-- 1 1001 127 109M Apr 22 02:38 vits-ljs/vits-ljs.onnx -Generate speech with executable compiled from C++ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Generate speech with executables compiled from C++ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash @@ -951,8 +951,8 @@ the file sizes of ``*.onnx`` files below. -rw-r--r-- 1 fangjun staff 37M Oct 16 10:57 vits-vctk.int8.onnx -rw-r--r-- 1 fangjun staff 116M Oct 16 10:57 vits-vctk.onnx -Generate speech with executable compiled from C++ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Generate speech with executables compiled from C++ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since there are 109 speakers available, we can choose a speaker from 0 to 198. The default speaker ID is 0. @@ -1757,8 +1757,8 @@ the file sizes of ``*.onnx`` files below. vits-icefall-zh-aishell3 fangjun$ ls -lh *.onnx -rw-r--r-- 1 fangjun staff 29M Mar 20 22:50 model.onnx -Generate speech with executable compiled from C++ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Generate speech with executables compiled from C++ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since there are 174 speakers available, we can choose a speaker from 0 to 173. The default speaker ID is 0. @@ -2018,8 +2018,8 @@ Please use the following commands to download it. You can find a lot of pre-trained models for over 40 languages at ``. -Generate speech with executable compiled from C++ -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Generate speech with executables compiled from C++ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash