Datasets

Introduction

On this page, we will list and discuss the different datasets and corpora used to build the different semantic and NLP models for Amharic.

Corpus

NLP Datasets

POS tagging

The POS tagged benchmark dataset is prepared from the work of Gashaw and Shashirekha 2018. Below are the different training, development, and test set splits

Type	number of sentences
Training set	29521
Development set	1678
Test set	1687

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasets

Introduction

Corpus

NLP Datasets

POS tagging

Clone this wiki locally