Myanmar (Burmese) Language Grapheme to Phoneme (myG2P) Conversion Dictionary for speech recognition (ASR) and speech synthesis (TTS).
မြန်မာလိုဖတ်မယ်ဆိုရင် --> README in Myanmar Language
Creative Commons Attribution-NonCommercial-Share Alike 4.0 International (CC BY-NC-SA 4.0) License
Details Info of License
Contact email: wasedakuma[at]gmail.com
We developed this myG2P (Myanmar Grapheme-to-Phoneme) dictionary for VoiceTra (Multilingual Speech Translation Application) Myanmar language project of NICT, Japan (during 2014-2015). We mainly used MLC (Myanmar Language Commission) dictionary words. Please cite the ICCA 2015 paper and/or COLING 2016 paper, if you use myG2P dictionary. Please cite PACLING 2015 paper, if you are talking about sentence level grapheme-to-phoneme conversion of Myanmar language.
The Myanmar Language Commission (MLC) Pronunciation Dictionary can be used as a basis for pronunciation mapping. We found it necessary to extend the dictionary with foreign pronunciations. In the proposed mapping table there are 23 phonetic symbols for 33 consonants (some consonants share the same pronunciation, for example, “ဒ”, “ဓ”, “ဍ” and “ဎ” in Table1), 87 vowels combinations and 20 special symbols for foreign word pronunciations. Characters are grouped according to their pronunciation; the groups are unaspirated, aspirated, voiced and nasal and are shown in Table 1. Many Myanmar syllables containing unaspirated and aspirated consonants are pronounced as voiced consonants depending on the neighboring context. Some foreign pronunciations have to be expressed by special vowel combinations because Myanmar pronunciations do not include some pronunciations. See Table 3. MLC dictionary was extended by defining 26 more symbols to include phoneme mappings for foreign words for example, the Myanmar phonetic representation of the foreign name “Alex” “အဲလက်(စ်)” is e:le’S (here, S is for (စ်)) and “Swift” “ဆွစ်(ဖ်)(ထ်)” is hswi’HPHT (here, HP is for (ဖ်) and HT is for (ထ်)).
Table 1: Groups of Myanmar consonants and their pronunciations
This section explains how the pronunciation of Myanmar syllables is normally derived from orthographic structure. Myanmar syllables are generally composed of consonants and (zero or more) vowel combinations starting with a consonant. Here, vowel combinations can be a single vowel, sequences of vowels starting with a consonant that modifies the pronunciation of the first vowel. The pronunciations of consonants when they are combined with vowels are shown in Table 2.
Table 2: Examples of vowel combinations and their pronunciations
Some Myanmar syllables do not conform to these standard rules of pronunciation. The pronunciation of the syllables can depend on the context of syllables. Differences between standard pronunciations and correct pronunciations of some words are shown in Table 3 as examples.
Tagle 3: Examples of contextually dependent pronunciations of some Myanmar words
The dictionary format is distributed as a plain text file with one entry to a line in the format as follow:
Word-ID<TAB>Word<TAB>Syllable-Breaked-Word<TAB>Pronunciation<TAB>IPA
Example:
19663 သုတ သု တ thu. ta. θṵ ta̰
19664 သုတစာပေ သု တ စာ ပေ thu. ta. sa pei θṵ ta̰ sà pè
19665 သုတိ သု တိ thu. ti. θṵ tḭ
19666 သုတေသန သု တေ သ န thu. tei tha- na. θṵ tè θə na̰
19667 သုတေသီ သု တေ သီ thu. tei thi θṵ tè θì
19668 သုဓမ္မာဇရပ် သု ဓမ် မာ ဇ ရပ် thu. da- ma za- ja' θṵ də mà zə jaʔ
19669 သုဓာဘုတ် သု ဓာ ဘုတ် thou' da bou' θoʊʔ dà boʊʔ
19670 သုနာပရန္တတိုင်း သု နာ ပ ရန် တ တိုင်း thu. na pa- ran ta. tain: θṵ nà pə ɹàɴ ta̰ táɪɴ
19671 သုဘရာဇာ သု ဘ ရာ ဇာ thu. ba. ja za θṵ ba̰ jà zà
19672 သုမင်္ဂလ သု မင် ဂ လ thu. min ga- la. θṵ mɪ̀ɴ ɡə la̰
Version.1.0, Released Date: May 30, 2017
Version.1.1, Released Date: Feb 25, 2019
Version.2.0, Released Date: Feb 15, 2021
Contributors for developing myG2P dictionary are as follows:
- Honey Htun (Ph.D. Candidate, Yangon Technological University, Myanmar)
- Ni Htwe Aung (Ph.D. Candidate,Yangon Technological University, Myanmar)
- Shwe Sin Moe (a Master's student, Yangon Technological University, Myanmar)
- Wint Theingi (a Master's student, Yangon Technological University, Myanmar)
- Ye Kyaw Thu (National Electronics and Computer Technology Center, Thailand)
We would like to express our gratitude to Ms. Aye Mya Hlaing and Ms. Hay Mar Soe Naing for checking G2P mappings. We also would like to thanks our NICT colleagues especially to Dr. Jinfu Ni and Dr. Yoshinori Shiga for their valuable suggestions on myG2P development.
-to add new Myanmar words from various domain
Ye Kyaw Thu, Win Pa Pa, Andrew Finch, Aye Mya Hlaing, Hay Mar Soe Naing, Eiichiro Sumita and Chiori Hori, "Syllable Pronunciation Features for Myanmar Grapheme to Phoneme Conversion", In Proceedings of the 13th International Conference on Computer Applications (ICCA 2015), February 5~6, 2015, Yangon, Myanmar, pp. 161-167. Paper [Best Paper Award]
Ye Kyaw Thu, Win Pa Pa, Andrew Finch, Jinfu Ni, Eiichiro Sumita and Chiori Hori, 2015, "The Application of Phrase Based Statistical Machine Translation Techniques to Myanmar Grapheme to Phoneme Conversion", In Proceedings of the Pacific Association for Computational Linguistics Conference (PACLING 2015), May 19~21, 2015, Legian, Bali, Indonesia, pp. 170-176. Paper (revised paper has been published in Springer Communication in Computer and Information Science (CCIS), ISSN:1865-0929, pp. 238-250)
☝️ We used myG2P dictionary + extracted 5,276 sentences of BTEC corpus for this PACLING 2015 conference paper
Ye Kyaw Thu, Win Pa Pa, Yoshinori Sagisaka, Naoto Iwahashi, "Comparison of Grapheme–to–Phoneme Conversion Methods on a Myanmar Pronunciation Dictionary", In Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), COLING 2016, December 11-17, 2016, Osaka, Japan, pp. 11–22. Paper
Title: Grapheme-to-IPA Phoneme Conversion for Burmese (myG2P Version 2.0)
Workshop: the 2nd joint Workshop on NLP/AI R&D, iSAI-NLP 2020, Bangkok, Thailand.
Authors: Honey Htun (YTU, Myanmar), Ni Htwe Aung (YTU, Myanmar), Shwe Sin Moe (YTU, Myanmar), Wint Theingi (YTU, Myanmar), Nyein Nyein Oo (YTU, Myanmar), Thepchai Supnithi (NECTEC, Thailand) and Ye Kyaw Thu (NECTEC, Thailand)
to appear
- Myanmar-English Dictionary (1993), Department of the Myanmar Language Commission, Ministry of Education, Union of Myanmar.
- https://en.wikipedia.org/wiki/International_Phonetic_Alphabet