Name		Name	Last commit message	Last commit date
parent directory ..
data_preprocessing		data_preprocessing
README.md		README.md

README.md

Fine-Tuning Data

Suggested directory structure

Show

data/
├── dataset_preprocessing
│    └── ...
├── ner
│    │
│    ├── ar
│    │    ├── train.txt.tmp
│    │    ├── dev.txt.tmp
│    │    └── test.txt.tmp
│    │
│    ├── ...
│    │
│    └── zh
│         ├── train.txt.tmp
│         ├── dev.txt.tmp
│         └── test.txt.tmp
├── sa
│    │
│    ├── ar
│    │    ├── train.tsv
│    │    ├── dev.tsv
│    │    └── test.tsv
│    │
│    ├── ...
│    │
│    └── zh
│         ├── train.tsv
│         ├── dev.tsv
│         └── test.tsv
├── qa
│    │
│    ├── ar
│    │    ├── train-v1.1.json
│    │    └── dev-v1.1.json
│    │
│    ├── ...
│    │
│    └── zh
│         ├── train-v1.1.json
│         └── dev-v1.1.json
│
└── udp_pos
     │
     ├── ar
     │    ├── ar_padt-ud-train.conllu
     │    ├── ar_padt-ud-dev.conllu
     │    └── ar_padt-ud-test.conllu
     │
     ├── ...
     │
     └── zh
          ├── zh_gsd-ud-train.conllu
          ├── zh_gsd-ud-dev.conllu
          └── zh_gsd-ud-test.conllu

Dataset download links

We provide download links to the fine-tuning datasets we used in the table below. We have preprocessed some of them for our experiments.

Important: Please refer to the preprocessing script for each dataset in data_preprocessing. The python scripts all contain docstrings at the top with information on how to use them. For the NER-related bash scripts we provide instructions in this README.md file. If there is neither a dedicated preprocessing dataset, nor instructions in the respective README.md on how to preprocess the data, this means that the data can be used as downloaded and does not require further preprocessing.

Also: When using any of these datasets in your own experiments, don't forget to cite their publications! Feel free to refer to our paper's references if you aren't sure which publication a dataset belongs to.

Lang	NER	SA	QA	UDP & POS
Arabic	Wikiann-panx	HARD	TyDiQA-GoldP-v1.1	Universal Dependencies 2.6 (Arabic-PADT)
English	CoNLL-2003	IMDb Movie Reviews	SQuAD-v1.1 (Train, Dev)	Universal Dependencies 2.6 (English-EWT)
Finnish	FiNER	---	TyDiQA-GoldP-v1.1	Universal Dependencies 2.6 (Finnish-FTB)
Indonesian	Wikiann-panx	Indonesian Prosa	TyDiQA-GoldP-v1.1	Universal Dependencies 2.6 (Indonesian-GSD)
Japanese	Wikiann-panx	Yahoo Movie Reviews	---	Universal Dependencies 2.6 (Japanese-GSD)
Korean	Corpus-morpheme	Naver Sentiment Movie Corpus (NSMC)	KorQuAD 1.0	Universal Dependencies 2.6 (Korean-GSD)
Russian	Wikiann-panx	RuReviews	SberQuAD	Universal Dependencies 2.6 (Russian-GSD)
Turkish	Wikiann-panx	Turkish Movie and Product Reviews	TQuAD-v0.1	Universal Dependencies 2.6 (Turkish-IMST)
Chinese	Chinese literature	ChnSentiCorp	Delta Reading Comprehension Dataet (DRCD)	Universal Dependencies 2.6 (Chinese-GSD)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

README.md

Fine-Tuning Data

Suggested directory structure

Dataset download links

Files

data

Directory actions

More options

Directory actions

More options

Latest commit

History

data

Folders and files

parent directory

README.md

Fine-Tuning Data

Suggested directory structure

Dataset download links