Tatoeba is a libre/free database of example sentences translated into many languages. Our goal is to create a resource for people studying languages—either to learn or research. The database is currently used:
As a source of example sentences by free dictionaries and language learning websites (like Jim Breen’s WWWJDIC; Jim Breen is actually a member too):
There's a list of free dictionary and language learning websites using Tatoeba's corpus maintained by our member CK: http://a4esl.com/temporary/tatoeba/links.html
As a rich resource for language learners: They can find out how to use words or how to translate grammatical constructs and idioms.
For research: example papers include:
- Research on treebanking Japanese (Francis Bond, 栗林 孝行 [Takayuki Kuribayashi], 橋本 力 [Hashimoto Chikara] (2008) HPSGに基づくフリーな日本語ツリー バンクの構築 [A free Japanese Treebank based on HPSG]. In 14th Annual Meeting of The Association for Natural Language Processing, Tokyo),
- Statistical machine translation (Eric Nichols, Francis Bond, Darren Scott Appling and Yuji Matsumoto (2010) Paraphrasing Training Data for Statistical Machine Translation. Journal of Natural Language Processing, 17(3), pages 101-122)
The main site currently has about 1 million page views and 250 thousands unique visitors monthly, as reported by Google Analytics, and the corpus is growing steadily by 3% or more every month.