-
Notifications
You must be signed in to change notification settings - Fork 9
NLP Tasks
Below we describe the different tasks and classification models we have build using the different semantic models
The classification model based on the FLAIR contextual embeddings is released at the LT group data and model repository.
You can download the model from here
Or, use the following python code to download the model. Make sure to install the required flair
and wget
modules.
pip install wget
pip install flair
Then load the model and use it as follows
import wget
import flair
from flair.data import Sentence
am_sent_model = wget.download("http://ltdata1.informatik.uni-hamburg.de/amharic/taskmodels/sent/final-model.pt")
# create example sentence
sentence = Sentence('ስንት ምስኪን ነበር ያኔ:: አንዱ ሲኒየር ነን ካፌ ላድርጋችሁ ብሎን ስንት ምግብ አዘን አብሮ የበላውን ክፈሉ ብሎን ጠፍቷል')
# predict class and print
from flair.models import TextClassifier
classifier = TextClassifier.load(am_sent_model)
classifier.predict(sentence)
print(sentence.labels)
You should see the following output
The POS model trained with the FLAIR embedding is better compared to others. See the results from our paper.
To use the POS dataset for your text, make sure to properly tokenize the Amharic text. Use our segmeter from here.
Once you have the sentence properly tokenized, you can first load the POS model from here. Or use the following python script
import wget
am_pos_model = wget.download("http://ltdata1.informatik.uni-hamburg.de/amharic/taskmodels/pos/final-model.pt")
Once the model is loaded, you can use it to your properly
tokenized sentence as follows
from flair.models import SequenceTagger
from flair.data import Sentence
classifier = SequenceTagger.load(am_pos_model)
# create example sentence
sentence = Sentence('አበበ ብዙ በሶ በላ ።')
# predict class and print
classifier.predict(sentence)
print(sentence.to_tagged_string())
You should see the following result
We have published the best model, which was based on FLAIR embedding to the LT group data repository. To use the Amharic named entity recognizer, run the following script
- First download the model from here. Or run the following code
pip install wget
am_ner_model = wget.download("http://ltdata1.informatik.uni-hamburg.de/amharic/taskmodels/ner/final-model.pt")
- make sure you have flair install in your environment
pip install flair
- Use the model as follows
from flair.data import Sentence
from flair.models import SequenceTagger
# load the model you trained
model = SequenceTagger.load(am_ner_model)
# or if you manually download the final-model.pt file do it as follows, make sure to put the correct path
# model = SequenceTagger.load('final-model.pt')
# create example sentence
sentence = Sentence('አበበ በሶ በለ ።')
# predict tags and print
model.predict(sentence)
print(sentence.to_tagged_string())
You should see the following output, detecting the person's name correctly.
The Amharic question classification model based on the FLAIR, AmRoBERTa, and XLMR contextual embeddings is released at the LT group data and model repository. The QA model trained with the AmRoBERTa embedding is better compared to others. See the results from our paper.
You can download the model from here
Transliteration is a process of converting ASCII represented Amharic texts back to the canonical Amharic letter representations (which are known as Ethiopic or Fidäl scripts).
For example,
zare sint ken new?
#can be transliterated to its Ethiopic representation as
ዛሬ ስንት ቀን ነው?
To transliterate Latic script to Amharic Fidel, Use our translitrator from here.
or
The following code show cases how to transliterate a given Latin script text to Amharic Fidel script text.
pip install amseg
from amseg.amharicTranslitrator import AmharicTranslitrator as transliterator
transliterated = transliterator.transliterate('misa belah')
# You should see the following output
transliterated = ‘ሚሳ በላህ’