Skip to content

chuuhtetnaing/myanmar-language-dataset-collection

Repository files navigation

Myanmar Language Dataset Collection

This repository serves as a collection of Myanmar language datasets, focusing on both speech and text resources. Given the scarcity and difficulty in finding Myanmar language datasets, our goal is to create a centralized reference point for researchers, developers, and language enthusiasts. As Myanmar language resources are often challenging to locate, we encourage contributions from the community.

If you know of or have access to additional Myanmar language datasets not listed here, please consider contributing by submitting a pull request or opening an issue. Let's collaborate to build a comprehensive inventory of Myanmar language resources.

Myanmar Langauge Speech Dataset

  1. Crowdsourced high-quality Burmese speech dataset (SLR80)

  2. BloomSpeech

    • HuggingFace Dataset
    • Notebook (Loading Myanmar Language)
    • Notes: Although it's showing burmese, the actual language='mya' is Palaung (De'ang / Ta'ang / Riang) language.

Myanmar Langauge Text Dataset

  1. Asian Language Treebank (ALT)

    • Download Page
    • HuggingFace Dataset
    • It supports translation between following languages:
      • Myanmar (Burmese) To Bengali
      • Myanmar (Burmese) To English
      • Myanmar (Burmese) To Filipino
      • Myanmar (Burmese) To Hindi
      • Myanmar (Burmese) To Bahasa Indonesia
      • Myanmar (Burmese) To Japanese
      • Myanmar (Burmese) To Khmer
      • Myanmar (Burmese) To Lao
      • Myanmar (Burmese) To Malay
      • Myanmar (Burmese) To Thai
      • Myanmar (Burmese) To Vietnamese
      • Myanmar (Burmese) To Chinese (Simplified Chinese).
  2. A Corpus of Modern Burmese

Releases

No releases published

Packages

No packages published