From e2fcb42f5f176d9e39eb38506ab99d0a3adaf202 Mon Sep 17 00:00:00 2001 From: Xiaoyu Yang <45973641+marcoyang1998@users.noreply.github.com> Date: Tue, 9 Jan 2024 15:41:37 +0800 Subject: [PATCH] fix typo (#1455) --- .../RNN-LM/librispeech/lm-training.rst | 27 +++++++------------ 1 file changed, 9 insertions(+), 18 deletions(-) diff --git a/docs/source/recipes/RNN-LM/librispeech/lm-training.rst b/docs/source/recipes/RNN-LM/librispeech/lm-training.rst index 46499a3748..e0c90f2a6e 100644 --- a/docs/source/recipes/RNN-LM/librispeech/lm-training.rst +++ b/docs/source/recipes/RNN-LM/librispeech/lm-training.rst @@ -4,7 +4,7 @@ Train an RNN language model ====================================== If you have enough text data, you can train a neural network language model (NNLM) to improve -the WER of your E2E ASR system. This tutorial shows you how to train an RNNLM from +the WER of your E2E ASR system. This tutorial shows you how to train an RNNLM from scratch. .. HINT:: @@ -15,23 +15,23 @@ scratch. .. note:: This tutorial is based on the LibriSpeech recipe. Please check it out for the necessary - python scripts for this tutorial. We use the LibriSpeech LM-corpus as the LM training set + python scripts for this tutorial. We use the LibriSpeech LM-corpus as the LM training set for illustration purpose. You can also collect your own data. The data format is quite simple: each line should contain a complete sentence, and words should be separated by space. -First, let's download the training data for the RNNLM. This can be done via the +First, let's download the training data for the RNNLM. This can be done via the following command: .. code-block:: bash - $ wget https://www.openslr.org/resources/11/librispeech-lm-norm.txt.gz + $ wget https://www.openslr.org/resources/11/librispeech-lm-norm.txt.gz $ gzip -d librispeech-lm-norm.txt.gz As we are training a BPE-level RNNLM, we need to tokenize the training text, which requires a BPE tokenizer. This can be achieved by executing the following command: .. code-block:: bash - + $ # if you don't have the BPE $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-2023-05-15 $ cd icefall-asr-librispeech-zipformer-2023-05-15/data/lang_bpe_500 @@ -56,11 +56,11 @@ sentence length. --out-statistics data/lang_bpe_500/lm_data_stats.txt -The aforementioned steps can be repeated to create a a validation set for you RNNLM. Let's say -you have a validation set in ``valid.txt``, you can just set ``--lm-data valid.txt`` +The aforementioned steps can be repeated to create a a validation set for you RNNLM. Let's say +you have a validation set in ``valid.txt``, you can just set ``--lm-data valid.txt`` and ``--lm-archive data/lang_bpe_500/lm-data-valid.pt`` when calling ``./local/prepare_lm_training_data.py``. -After completing the previous steps, the training and testing sets for training RNNLM are ready. +After completing the previous steps, the training and testing sets for training RNNLM are ready. The next step is to train the RNNLM model. The training command is as follows: .. code-block:: bash @@ -77,7 +77,7 @@ The next step is to train the RNNLM model. The training command is as follows: --use-fp16 0 \ --tie-weights 1 \ --embedding-dim 2048 \ - --hidden_dim 2048 \ + --hidden-dim 2048 \ --num-layers 3 \ --batch-size 300 \ --lm-data rnn_lm/data/lang_bpe_500/sorted_lm_data.pt \ @@ -93,12 +93,3 @@ The next step is to train the RNNLM model. The training command is as follows: .. note:: The training of RNNLM can take a long time (usually a couple of days). - - - - - - - - -