Skip to content

Latest commit

 

History

History
46 lines (39 loc) · 2.48 KB

File metadata and controls

46 lines (39 loc) · 2.48 KB

Myanmar Language Dataset Collection

This repository serves as a collection of Myanmar language datasets, focusing on both speech and text resources. Given the scarcity and difficulty in finding Myanmar language datasets, our goal is to create a centralized reference point for researchers, developers, and language enthusiasts. As Myanmar language resources are often challenging to locate, we encourage contributions from the community.

If you know of or have access to additional Myanmar language datasets not listed here, please consider contributing by submitting a pull request or opening an issue. Let's collaborate to build a comprehensive inventory of Myanmar language resources.

Myanmar Langauge Speech Dataset

  1. Crowdsourced high-quality Burmese speech dataset (SLR80)

  2. BloomSpeech

    • HuggingFace Dataset
    • Notebook (Loading Myanmar Language)
    • Notes: Although it's showing burmese, the actual language='mya' is Palaung (De'ang / Ta'ang / Riang) language.

Myanmar Langauge Text Dataset

  1. Asian Language Treebank (ALT)

    • Download Page
    • HuggingFace Dataset
    • It supports translation between following languages:
      • Myanmar (Burmese) To Bengali
      • Myanmar (Burmese) To English
      • Myanmar (Burmese) To Filipino
      • Myanmar (Burmese) To Hindi
      • Myanmar (Burmese) To Bahasa Indonesia
      • Myanmar (Burmese) To Japanese
      • Myanmar (Burmese) To Khmer
      • Myanmar (Burmese) To Lao
      • Myanmar (Burmese) To Malay
      • Myanmar (Burmese) To Thai
      • Myanmar (Burmese) To Vietnamese
      • Myanmar (Burmese) To Chinese (Simplified Chinese).
  2. A Corpus of Modern Burmese