Mispronunciation detection code for jingju singing voice.
Rong Gong's thesis "Automatic assessment of singing voice pronunciation: a case study with jingju music" chapter 6.
This repo contains two methods:
- Baseline - forced alignment system built on Kaldi.
- Deep learning - discriminative model built using Keras and Tensorflow.
The main idea of the forced alignment-based mispronunciation detect is to use two lexicons respectively for training and testing phases. The detail of this idea is described in section 6.2.1 in the thesis.
We here only explain the general pipeline of the model training and testing. Please write to the author if you want to know how to use the code for your own experiment. Pipeline:
- generate language dictionary by using
srcPy/parseLang.py
. - generate all the files that Kaldi need, e.g., text, wav.scp, phone.txt, by
srcPy/parseTextRepetition.py
. - run the model training and decode the text for test data by
run.sh
- parse decoded pronunciation by
srcPy/parse_decoded_pronunciation.py
- evaluation
srcPy/mispron_eval.py
We built discriminative models for mispronunciation detection. Two types of model are built, one for special pronunciation, another for jiantuanzi syllables. We have experimented several deep learning architectures, such as BiLSTM, CNN, attention, Temporal convolutional networks (TCNs), self-attention. The details are described in sections 6.3 and 6.4 in the thesis. Here, we also only describe the pipeline of model training and testing. Please write to the auther if you want to use the code for your own experiment. Pipeline:
- collecting training logarithmic Mel representation by
training_sample_collection_syllable.py
- train various deep learning architectures by using
train_rnn_jianzi.py
ortrain_rnn_special.py
respectively for special pronunciation and jiantuanzi models. e.g., attention var can befeedforward
orselfatt
. - train TCNs architectures by using
train_rnn_special_tcn.py
andtrain_rnn_jianzi_tcn.py
- evaluation by
eval.py
Rong Gong: rong.gong<at>upf.edu
GNU Affero General Public License 3.0