Uyghur-Chinese Corpus - HKBU

The Uyghur Language Module project will compile a corpus of bilingual public documents in Chinese and Uyghur illustrating translation practices and policies in Inner Asian territories since the Qing and continuing to the present day. The minimum number of documents in Uyghur to include in the corpus is 200 (plus the Chinese parallel versions) and the division by period will be roughly 50% modern, 30% Republican, and 20% Qing. Based on the corpus, the project team will carry out a series of studies using corpus linguistics methods on the large-scale corpus of searchable bi- or trilingual documents in the project languages. The project itself will focus on preparing metadata, OCR, and clean-up for the source materials to enable keyword extraction and bilingual alignment of our data.

To browse the corpus click the following link: https://htmlpreview.github.io/?https://github.com/FChrispz/UYGHUR_TEST/blob/main/Metadata_22_09.html

or

https://htmlpreview.github.io/?https://github.com/FChrispz/UYGHUR_TEST/blob/main/Metadata_22_09_css.html

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
PA		PA
Metadata.xml		Metadata.xml
Metadata_22_09.html		Metadata_22_09.html
Metadata_22_09_css.html		Metadata_22_09_css.html
Metadata_exported.xml		Metadata_exported.xml
Metadata_exported.xsl		Metadata_exported.xsl
PA001.txt		PA001.txt
PA_Chinese_Raw.zip		PA_Chinese_Raw.zip
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uyghur-Chinese Corpus - HKBU

About

Releases

Packages

Languages

FChrispz/UYGHUR_TEST

Folders and files

Latest commit

History

Repository files navigation

Uyghur-Chinese Corpus - HKBU

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages