Course description

A large part of recent research in language technology (LT) is restricted to a small number of languages. While more and more datasets are created, made available, and used for English and a few other languages, the large majority of the world's languages is hardly ever the object of LT research. In this course, we will introduce and discuss several definitions of so-called 'low-resource languages', and we will examine how LT systems (such as taggers or parsers) can be developed for such languages despite the challenging data situation. In particular, we will discuss how linguistic annotations or models can be transferred from a resource-rich to a resource-poor language. In this setting, we have to distinguish cases where the two languages are etymologically closely related from cases where they are not. We will also see how these methods can be applied to 'special' types of low-resource languages such as historical language varieties, dialects, and sociolects, whose automatic processing faces similar challenges.

Day-to-day program

Monday

Definitions of low-resource languages in linguistics and computational linguistics

Overview of the main language technology applications and their resource requirements

Yulia Tsvetkov (2017): Opportunities and challenges in working with low-resource languages. (Slides, Part 1) http://www.cs.cmu.edu/~ytsvetko/jsalt-part1.pdf
META-NET Strategic Research Agenda for Multilingual Europe 2020. (Sections 1, 2, and 4) http://www.meta-net.eu/vision/reports/meta-net-sra-version_1.0.pdf

Tuesday

Annotation

Data transfer vs. model transfer

Data transfer approaches: annotation projection, training data translation, ...

Dan Garrette & Jason Baldridge (2013): Learning a part-of-speech tagger from two hours of annotation. Proceedings of NAACL-HLT. http://www.aclweb.org/anthology/N13-1014
David Yarowsky & Grace Ngai (2001): Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora. Proceedings of NAACL-HLT. http://aclweb.org/anthology/N/N01/N01-1026.pdf
Jörg Tiedemann & Zeljko Agic (2016): Synthetic treebanking for cross-lingual dependency parsing. (Sections 1 and 2) Journal of Artificial Intelligence Research 55. https://www.jair.org/index.php/jair/article/view/10980

Wednesday

Model transfer approaches: plain model transfer, delexicalization, relexicalization, cross-lingual clusters and embeddings

Ryan McDonald, Slav Petrov & Keith Hall (2011): Multi-source transfer of delexicalized dependency parsers. Proceedings of EMNLP. https://www.aclweb.org/anthology/D11-1006
Oscar Täckström, Ryan McDonald & Jakob Uszkoreit (2012): Cross-lingual word clusters for direct transfer of linguistic structure. Proceedings of NAACL-HLT. http://aclweb.org/anthology/N/N12/N12-1052.pdf

Thursday

Closely related languages and language varieties - definitions, problems and solutions

Delphine Bernhard & Anne-Laure Ligozat (2013): Hassle-free POS-Tagging for the Alsatian Dialects. In: Marcos Zampieri & Sascha Diwersy: Non-Standard Data Sources in Corpus Based-Research, Shaker, ZSM Studien. https://hal.archives-ouvertes.fr/hal-00860790
Yves Scherrer & Achim Rabus (2017): Multi-source morphosyntactic tagging for Spoken Rusyn. Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects. http://www.aclweb.org/anthology/W/W17/W17-1210.pdf

Friday

Multilingual modelling and zero-shot learning

Melvin Johnson et al. (2017): Google's multilingual neural machine translation system - enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5/2017. https://www.aclweb.org/anthology/Q/Q17/Q17-1024.pdf
Ryan Cotterell & Georg Heigold (2017): Cross-lingual character-level neural morphological tagging. Proceedings of EMNLP. http://www.aclweb.org/anthology/D/D17/D17-1078.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
slides1.pdf		slides1.pdf
slides2.pdf		slides2.pdf
slides3.pdf		slides3.pdf
slides4.pdf		slides4.pdf
slides5.pdf		slides5.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Course description

Day-to-day program

Monday

Tuesday

Wednesday

Thursday

Friday

About

Releases

Packages

yvesscherrer/lot

Folders and files

Latest commit

History

Repository files navigation

Course description

Day-to-day program

Monday

Tuesday

Wednesday

Thursday

Friday

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages