Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

Europarl

[Dataset Download] [Original Paper]

The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 21 European languages: Romanic (French, Italian, Spanish, Portuguese, Romanian), Germanic (English, Dutch, German, Danish, Swedish), Slavik (Bulgarian, Czech, Polish, Slovak, Slovene), Finni-Ugric (Finnish, Hungarian, Estonian), Baltic (Latvian, Lithuanian), and Greek.

Citation

@inproceedings{koehn-2005-europarl, title = "{E}uroparl: A Parallel Corpus for Statistical Machine Translation", author = "Koehn, Philipp", booktitle = "Proceedings of Machine Translation Summit X: Papers", month = sep # " 13-15", year = "2005", address = "Phuket, Thailand", url = "https://aclanthology.org/2005.mtsummit-papers.11", pages = "79--86", abstract = "We collected a corpus of parallel text in 11 languages from the proceedings of the European Parliament, which are published on the web. This corpus has found widespread use in the NLP community. Here, we focus on its acquisition and its application as training data for statistical machine translation (SMT). We trained SMT systems for 110 language pairs, which reveal interesting clues into the challenges ahead.", }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

europarl

europarl

README.md

Europarl

Citation

Files

europarl

Directory actions

More options

Directory actions

More options

Latest commit

History

europarl

Folders and files

parent directory

README.md

Europarl

Citation