-
Notifications
You must be signed in to change notification settings - Fork 9
Datasets
Seid Muhie Yimam edited this page Nov 5, 2021
·
13 revisions
On this page, we will list and discuss the different datasets and corpora used to build the different semantic and NLP models for Amharic.
The POS tagged benchmark dataset is prepared from the work of Gashaw and Shashirekha 2018. Below are the different training, development, and test set splits
Type | number of sentences |
---|---|
Training set | 29521 |
Development set | 1678 |
Test set | 1687 |