Skip to content

FChrispz/UYGHUR_TEST

Repository files navigation

Uyghur-Chinese Corpus - HKBU

The Uyghur Language Module project will compile a corpus of bilingual public documents in Chinese and Uyghur illustrating translation practices and policies in Inner Asian territories since the Qing and continuing to the present day. The minimum number of documents in Uyghur to include in the corpus is 200 (plus the Chinese parallel versions) and the division by period will be roughly 50% modern, 30% Republican, and 20% Qing. Based on the corpus, the project team will carry out a series of studies using corpus linguistics methods on the large-scale corpus of searchable bi- or trilingual documents in the project languages. The project itself will focus on preparing metadata, OCR, and clean-up for the source materials to enable keyword extraction and bilingual alignment of our data.

To browse the corpus click the following link: https://htmlpreview.github.io/?https://github.com/FChrispz/UYGHUR_TEST/blob/main/Metadata_22_09.html

or

https://htmlpreview.github.io/?https://github.com/FChrispz/UYGHUR_TEST/blob/main/Metadata_22_09_css.html

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages