- Prepare the Speech Corpus: Preprocess the text transcripts by splitting words into individual letters.
- Create the G2G Pronunciation Dictionary: Map each letter to itself in the dictionary.
- Train the Acoustic Model: Use the Speech Corpus and the G2G Pronunciation Dictionary to train the model.
- Align the Corpus: Use the trained Acoustic Model to align the text to the speech.
- Post-Processing: Adjust the TextGrid alignments to include necessary punctuation marks.
speech_corpus_directory/
│
├── speaker1/
│ ├── recording1.wav
│ ├── recording1.lab
│ ├── recording2.wav
│ └── recording2.lab
│
├── speaker2/
│ ├── recording3.wav
│ └── recording3.lab
│ ...
- Ensure all audio files are in
.wav
format, preferably at 16kHz. - Preprocess the corresponding text transcripts by splitting the words into individual letters.
Original Text:
அதற்குத் தகுந்தபடி, ஏதாவது கொஞ்சம் பேசி, வேஷம் போட்டால் போகிறது
Preprocessed Text:
அ த ற ் க ு த ் த க ு ந ் த ப ட ி , ஏ த ா வ த ு க ொ ஞ ் ச ம ் ப ே ச ி , வ ே ஷ ம ் ப ோ ட ் ட ா ல ் ப ோ க ி ற த ு .
Create a G2G Pronunciation Dictionary where each letter maps to itself.
[Letter 1]-[space]-[space]-[Letter 1]
[Letter 2]-[space]-[space]-[Letter 2]
அ அ
ஆ ஆ
இ இ
... ...
mfa validate [path_to_speech_corpus] [path_to_dictionary_file]
mfa train [path_to_speech_corpus] [path_to_dictionary_file] [output_path]
mfa train --clean --phone_set UNKNOWN --use_mp -j 16 --single_speaker ~/path/to/speech_corpus ~/path/to/Tamil_Dictionary_g2g.txt ~/path/to/Tamil_Acoustic_Model.zip
--clean: Clears the cache of previous runs.
--phone_set UNKNOWN: Since phonemes are defined as graphemes.
--use_mp: Enables multi-processing.
-j 16: Specifies the number of cores to use. Here, it is 16.
--single_speaker: Indicates that the corpus is from a single speaker.
acoustic_model_directory/
│
├── final.alimdl # Alignment model file
├── final.mdl # Final model file
├── lda.mat # Linear Discriminant Analysis matrix
├── meta.json # Metadata file containing model information
├── phone_lm.fst # Phone-level language model in FST format
├── phone_pdf.counts # Phone PDF counts file
├── phones.txt # Phone list
├── rules.yaml # Rules file
└── Tree # Decision tree file