This corpus contains a growing collection of multilingual texts aligned to Tibetan texts (bo) at the sentence level.
- 2,08,736 Tibetan segments
- 4,611 files
- Files from Lotsawa House and 84,000
Languages: | bo-en | bo-es | bo-fr | bo-de | bo-it | bo-nl | bo-zh | bo-pt |
---|---|---|---|---|---|---|---|---|
Segments: | 2,08,736 | 3,481 | 8,971 | 5,892 | 1,129 | 889 | 2,573 | 2018 |
Detailed content description • Views • Coming soon • Help • Acknowledgments • Terms of use
Source: | https://www.lotsawahouse.org/ |
---|---|
Pairs: | 76,135 |
Files: | 4,405 |
Accessed on: | 2023-01-04 12:44:15.146037 |
Crawler: | LH Crawler |
Parser: | LH Parser |
Layers: | Base + Segments |
Included texts: | See text pairs catalog |
Languages: | bo-en | bo-es | bo-fr | bo-de | bo-it | bo-nl | bo-zh | bo-pt |
---|---|---|---|---|---|---|---|---|
Segments: | 76,135 | 3,481 | 8,971 | 5,892 | 1,129 | 889 | 2,573 | 2018 |
Source | https://read.84000.co/ |
---|---|
Pairs | 132601 |
Files | 206 |
Accessed On | 2018-09-26T07:14:13.428Z |
Crawler: | TMX Crawler |
Parser: | TMX Parser |
Layers | Base + Segments |
Included texts: | See text pairs catalog |
Languages: | bo-en |
---|---|
Segments: | 1,32,601 |
This collection presents the same data in two views: text pairs and TMs.
Plain text pairs in .txt format (see detailed catalog).
More
Text pairs consist of matching sets of .txt
files. They include a file containing a Tibetan text with one chunk of text per line and one or more .txt
files of translations of the text into other languages. Translation files are also split into lines to correspond to the line breaks in the Tibetan file.
Titles of any file can be found on line 1 of the file or by searching for a file's identifying number (e.g. A00023033) in the corpus's text pairs catalog.
Text pairs or groups share the same identifying number and differ only in the ending language tag.
Example:
- A00023033-bo.txt
- A00023033-en.txt
- A00023033-fr.txt
As stated above, these files are aligned by line to match the Tibetan version.
Example: Tibetan text
1 ༄༅། །འཆི་མེད་འཕགས་མའི་སྙིང་ཐིག་གི་བརྒྱུད་པའི་གསོལ་འདེབས་ཚེ་དབང་བཅུད་འཛིན་ཞེས་བྱ་བ་བཞུགས་སོ། །
2 རང་བྱུང་རྟག་པའི་རྡོ་རྗེ་ཚེ་དཔག་མེད། །
3 འཆི་བདག་བདུད་འཇོམས་གཙུག་ཏོར་རྣམ་པར་རྒྱལ། །
English text
1 The Fount of Longevity Chimé Phakmé Nyingtik Lineage Prayer
2 Amitāyus, Boundless Life, natural, everlasting and indestructible,
3 Uṣṇīṣa-Vijayā, Victorious Conqueror of māra , Lord of Death,
French text
1 La Fontaine de longévité La prière à la lignée de Chimé p'akmé nyingthik
2 Existant par lui-même et éternel, indestructible Amitāyus,
3 Celle qui triomphe du démon Seigneur de la mort, Uṣṇīṣa Vijayā,
View 1 is intended for developers who want to train a translation model.
How to use it
This data can be fed into machine translation training pipelines such as using this and that.
TM files in .tmx format (see detailed catalog).
Note: If you need a different format, check the how to get help section below.
View 2 is intended for developers who want to train a translation model.
How to use it
This data can be fed into machine translation training pipelines such as using this and that.
- 700 more texts from Lotsawa House
- 87 texts from Oslo
- Email us at openpecha[at]gmail.com.
- Join our Discord.
- File an issue.
Thanks to the following organizations for providing data for this collection:
This corpus is provided by OpenPecha under the CC0 Public Domain Dedication v 1.0.