Mispronunciation detection

Mispronunciation detection code for jingju singing voice.

Rong Gong's thesis "Automatic assessment of singing voice pronunciation: a case study with jingju music" chapter 6.

This repo contains two methods:

Baseline - forced alignment system built on Kaldi.
Deep learning - discriminative model built using Keras and Tensorflow.

Baseline

The main idea of the forced alignment-based mispronunciation detect is to use two lexicons respectively for training and testing phases. The detail of this idea is described in section 6.2.1 in the thesis.

We here only explain the general pipeline of the model training and testing. Please write to the author if you want to know how to use the code for your own experiment. Pipeline:

generate language dictionary by using srcPy/parseLang.py.
generate all the files that Kaldi need, e.g., text, wav.scp, phone.txt, by srcPy/parseTextRepetition.py.
run the model training and decode the text for test data by run.sh
parse decoded pronunciation by srcPy/parse_decoded_pronunciation.py
evaluation srcPy/mispron_eval.py

Deep learning-based discriminative model

We built discriminative models for mispronunciation detection. Two types of model are built, one for special pronunciation, another for jiantuanzi syllables. We have experimented several deep learning architectures, such as BiLSTM, CNN, attention, Temporal convolutional networks (TCNs), self-attention. The details are described in sections 6.3 and 6.4 in the thesis. Here, we also only describe the pipeline of model training and testing. Please write to the auther if you want to use the code for your own experiment. Pipeline:

collecting training logarithmic Mel representation by training_sample_collection_syllable.py
train various deep learning architectures by using train_rnn_jianzi.py or train_rnn_special.py respectively for special pronunciation and jiantuanzi models. e.g., attention var can be feedforward or selfatt.
train TCNs architectures by using train_rnn_special_tcn.py and train_rnn_jianzi_tcn.py
evaluation by eval.py

Contact

Rong Gong: rong.gong<at>upf.edu

Code license

GNU Affero General Public License 3.0

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
kaldi_alignment		kaldi_alignment
neural_net		neural_net
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mispronunciation detection

Baseline

Deep learning-based discriminative model

Contact

Code license

About

Releases

Packages

Languages

License

ronggong/mispronunciation-detection

Folders and files

Latest commit

History

Repository files navigation

Mispronunciation detection

Baseline

Deep learning-based discriminative model

Contact

Code license

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages