Skip to content
This repository has been archived by the owner on Jul 2, 2024. It is now read-only.

Commit

Permalink
Add CC-100 languages
Browse files Browse the repository at this point in the history
  • Loading branch information
calpt committed Aug 31, 2023
1 parent 820b9d9 commit f04a051
Show file tree
Hide file tree
Showing 127 changed files with 542 additions and 0 deletions.
5 changes: 5 additions & 0 deletions subtasks/text_lang/af_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: af
subtask: cc100
description: Language modeling for the Afrikaans language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/am_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: am
subtask: cc100
description: Language modeling for the Amharic language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/ar_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: ar
subtask: cc100
description: Language modeling for the Arabic language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/az_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: az
subtask: cc100
description: Language modeling for the Azerbaijani language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/be_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: be
subtask: cc100
description: Language modeling for the Belarusian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/bg_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: bg
subtask: cc100
description: Language modeling for the Bulgarian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/bn_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: bn
subtask: cc100
description: Language modeling for the Bengali language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/ca_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: ca
subtask: cc100
description: Language modeling for the Catalan language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/cs_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: cs
subtask: cc100
description: Language modeling for the Czech language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/cy_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: cy
subtask: cc100
description: Language modeling for the Welsh language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/da_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: da
subtask: cc100
description: Language modeling for the Danish language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/de_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: de
subtask: cc100
description: Language modeling for the German language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/el_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: el
subtask: cc100
description: Language modeling for the Greek language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/en_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: en
subtask: cc100
description: Language modeling for the English language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/eo_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: eo
subtask: cc100
description: Language modeling for the Esperanto language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/es_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: es
subtask: cc100
description: Language modeling for the Spanish language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/et_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: et
subtask: cc100
description: Language modeling for the Estonian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/eu_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: eu
subtask: cc100
description: Language modeling for the Basque language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/fa_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: fa
subtask: cc100
description: Language modeling for the Persian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/fi_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: fi
subtask: cc100
description: Language modeling for the Finnish language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/fr_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: fr
subtask: cc100
description: Language modeling for the French language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/ga_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: ga
subtask: cc100
description: Language modeling for the Irish language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/gl_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: gl
subtask: cc100
description: Language modeling for the Galician language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/gu_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: gu
subtask: cc100
description: Language modeling for the Gujarati language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/ha_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: ha
subtask: cc100
description: Language modeling for the Hausa language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/he_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: he
subtask: cc100
description: Language modeling for the Hebrew language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/hi_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: hi
subtask: cc100
description: Language modeling for the Hindi language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/hr_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: hr
subtask: cc100
description: Language modeling for the Croatian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/hu_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: hu
subtask: cc100
description: Language modeling for the Hungarian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/hy_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: hy
subtask: cc100
description: Language modeling for the Armenian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/id_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: id
subtask: cc100
description: Language modeling for the Indonesian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/is_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: is
subtask: cc100
description: Language modeling for the Icelandic language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/it_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: it
subtask: cc100
description: Language modeling for the Italian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/ja_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: ja
subtask: cc100
description: Language modeling for the Japanese language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/ka_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: ka
subtask: cc100
description: Language modeling for the Georgian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/kk_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: kk
subtask: cc100
description: Language modeling for the Kazakh language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/km_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: km
subtask: cc100
description: Language modeling for the Central Khmer language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/kn_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: kn
subtask: cc100
description: Language modeling for the Kannada language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/ko_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: ko
subtask: cc100
description: Language modeling for the Korean language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/ku_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: ku
subtask: cc100
description: Language modeling for the Kurdish language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/ky_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: ky
subtask: cc100
description: Language modeling for the Kirghiz language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/la_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: la
subtask: cc100
description: Language modeling for the Latin language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/lo_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: lo
subtask: cc100
description: Language modeling for the Lao language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/lt_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: lt
subtask: cc100
description: Language modeling for the Lithuanian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/lv_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: lv
subtask: cc100
description: Language modeling for the Latvian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/mk_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: mk
subtask: cc100
description: Language modeling for the Macedonian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/ml_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: ml
subtask: cc100
description: Language modeling for the Malayalam language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/mn_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: mn
subtask: cc100
description: Language modeling for the Mongolian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/mr_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: mr
subtask: cc100
description: Language modeling for the Marathi language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/ms_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: ms
subtask: cc100
description: Language modeling for the Malay language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/my_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: my
subtask: cc100
description: Language modeling for the Burmese language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/ne_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: ne
subtask: cc100
description: Language modeling for the Nepali language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/nl_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: nl
subtask: cc100
description: Language modeling for the Dutch language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/no_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: no
subtask: cc100
description: Language modeling for the Norwegian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/or_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: or
subtask: cc100
description: Language modeling for the Oriya language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/pa_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: pa
subtask: cc100
description: Language modeling for the Punjabi language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/pl_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: pl
subtask: cc100
description: Language modeling for the Polish language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/ps_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: ps
subtask: cc100
description: Language modeling for the Pashto language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/pt_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: pt
subtask: cc100
description: Language modeling for the Portuguese language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/ro_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: ro
subtask: cc100
description: Language modeling for the Romanian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/ru_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: ru
subtask: cc100
description: Language modeling for the Russian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/sa_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: sa
subtask: cc100
description: Language modeling for the Sanskrit language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/si_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: si
subtask: cc100
description: Language modeling for the Sinhala language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/sk_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: sk
subtask: cc100
description: Language modeling for the Slovak language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/sl_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: sl
subtask: cc100
description: Language modeling for the Slovenian language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
5 changes: 5 additions & 0 deletions subtasks/text_lang/so_cc100.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
task: so
subtask: cc100
description: Language modeling for the Somali language on the CC-100 corpus.
url: https://data.statmt.org/cc-100/
citation: ''
Loading

0 comments on commit f04a051

Please sign in to comment.