Magichub Code-Switching ASR Challenge Baseline System Description

The baseline system of Magichub Code-Switching ASR Challenge is developed using ETEH.

Model Architecture

For the model architecture, we use Transformer. We use a 2-layer convolutional neural network (CNN) as the front-end. Each CNN layer has 320 filters, each of which has 3x3 kernel size with 2x2 stride. The self-attention encoder and decoder are 17-layer and 6-layer, respectively. All sub-layers, as well as embedding layers, produce outputs of dimension 320. In the multi-head attention networks, the head number is 8. The inner dimension of position-wise feed-forward networks is 2,048. All the ASR models are trained with batch size 512, using Adam algorithm with gradient clipping norm 5, warm-up of 25,000 steps, and Noam learning rate decay scheme. We train the ASR model for 30 epochs.

Training Dataset and Features

We combine the MagicData-RAMC Train and TAL_CSASR Train as the training dataset for the baseline ASR model. We use 83-dimensional features, which include 80-dimensional filter banks and 3-dimensional pitch features, as input acoustic features. Features are extracted with a 25ms Hamming window, shifted every 10ms. We DO NOT apply CMVN to the acoustic features.

Output Targets

The ASR model predicts subword units based on byte pair encoding (BPE) for English and Chinese characters for Mandarin as output targets. There are 5,276 output targets in total, of which 1,007 are English subword units (including special symbols) and 4,269 are Chinese characters.

Data Download

Dev: https://magichub.com/datasets/dev-set-of-chinese-english-code-mixing-conversational-speech-corpus/

Test: https://magichub.com/datasets/chinese-english-code-mixing-conversational-speech-corpus/

Dev Set Data Pre-processing

Please refer to "dev_data_preprocess.sh" for details.

Dev Set Scoring

Please refer to "dev_scoring_sclite.sh" for details.

Test Set Pre-processing

Please refer to "test_data_preprocess.sh" for details. When submitting your final hyp file for scoring, please make sure that the utterence ID format is same as the one in "test/ref_example.gb.txt" generated by "test_data_preprocess.sh".

Baseline Model Performance

dev MER: 29.2%

test MER: 26.5%

Other scoring details:

   ,-----------------------------------------------------------------------.
   |        exp/talcs_magic_160/eteh_baseline/decode/hyp.dev.gb.txt        |
   |-----------------------------------------------------------------------|
   | SPKR   | # Snt    # Chr  | Corr     Sub    Del     Ins    Err   S.Err |
   |--------+-----------------+--------------------------------------------|
   | g00    |  4456    34286  | 75.7    21.6    2.7     4.9   29.2    72.2 |
   |=======================================================================|
   | Sum/Avg|  4456    34286  | 75.7    21.6    2.7     4.9   29.2    72.2 |
   |=======================================================================|
   |  Mean  |4456.0   34286.0 | 75.7    21.6    2.7     4.9   29.2    72.2 |
   |  S.D.  |  0.0       0.0  |  0.0     0.0    0.0     0.0    0.0     0.0 |
   | Median |4456.0   34286.0 | 75.7    21.6    2.7     4.9   29.2    72.2 |
   `-----------------------------------------------------------------------'
   

     ,--------------------------------------------------------------------.
     |      exp/talcs_magic_160/eteh_baseline/decode/hyp.test.gb.txt      |
     |--------------------------------------------------------------------|
     | SPKR   |  # Snt   # Chr  | Corr    Sub    Del    Ins    Err  S.Err |
     |--------+-----------------+-----------------------------------------|
     | g00    | 11243    77302  | 77.3   20.2    2.5    3.8   26.5   63.5 |
     |====================================================================|
     | Sum/Avg| 11243    77302  | 77.3   20.2    2.5    3.8   26.5   63.5 |
     |====================================================================|
     |  Mean  |11243.0  77302.0 | 77.3   20.2    2.5    3.8   26.5   63.5 |
     |  S.D.  |   0.0      0.0  |  0.0    0.0    0.0    0.0    0.0    0.0 |
     | Median |11243.0  77302.0 | 77.3   20.2    2.5    3.8   26.5   63.5 |
     `--------------------------------------------------------------------'

Contact

If you have any questions, please contact us. You could open an issue on github or email us.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Magichub Code-Switching ASR Challenge Baseline System Description

Model Architecture

Training Dataset and Features

Output Targets

Data Download

Dev Set Data Pre-processing

Dev Set Scoring

Test Set Pre-processing

Baseline Model Performance

Contact

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
test		test
README.md		README.md
dev_data_preprocess.sh		dev_data_preprocess.sh
dev_scoring_sclite.sh		dev_scoring_sclite.sh
test_data_preprocess.sh		test_data_preprocess.sh

MagicHub-io/CSASR_Challenge

Folders and files

Latest commit

History

Repository files navigation

Magichub Code-Switching ASR Challenge Baseline System Description

Model Architecture

Training Dataset and Features

Output Targets

Data Download

Dev Set Data Pre-processing

Dev Set Scoring

Test Set Pre-processing

Baseline Model Performance

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages