The Uyghur Language Module project will compile a corpus of bilingual public documents in Chinese and Uyghur illustrating translation practices and policies in Inner Asian territories since the Qing and continuing to the present day. The minimum number of documents in Uyghur to include in the corpus is 200 (plus the Chinese parallel versions) and the division by period will be roughly 50% modern, 30% Republican, and 20% Qing. Based on the corpus, the project team will carry out a series of studies using corpus linguistics methods on the large-scale corpus of searchable bi- or trilingual documents in the project languages. The project itself will focus on preparing metadata, OCR, and clean-up for the source materials to enable keyword extraction and bilingual alignment of our data.
To browse the corpus click the following link: https://htmlpreview.github.io/?https://github.com/FChrispz/UYGHUR_TEST/blob/main/Metadata_22_09.html
or