Skip to content

NLP Tasks

Tadesse Destaw edited this page Nov 5, 2022 · 14 revisions

Tasks

Below we describe the different tasks and classification models we have build using the different semantic models


Sentiment Analysis

The classification model based on the FLAIR contextual embeddings is released at the LT group data and model repository.

You can download the model from here

Or, use the following python code to download the model. Make sure to install the required flair and wget modules.

pip install wget
pip install flair

Then load the model and use it as follows

import wget
import flair
from flair.data import Sentence
am_sent_model = wget.download("http://ltdata1.informatik.uni-hamburg.de/amharic/taskmodels/sent/final-model.pt")

# create example sentence
sentence = Sentence('ስንት ምስኪን ነበር ያኔ:: አንዱ ሲኒየር ነን ካፌ ላድርጋችሁ ብሎን ስንት ምግብ አዘን አብሮ የበላውን ክፈሉ ብሎን ጠፍቷል')

# predict class and print
from flair.models import TextClassifier
classifier = TextClassifier.load(am_sent_model)
classifier.predict(sentence)
print(sentence.labels)

You should see the following output


POS Tagging

The POS model trained with the FLAIR embedding is better compared to others. See the results from our paper.

To use the POS dataset for your text, make sure to properly tokenize the Amharic text. Use our segmeter from here.

Once you have the sentence properly tokenized, you can first load the POS model from here. Or use the following python script

import wget

am_pos_model = wget.download("http://ltdata1.informatik.uni-hamburg.de/amharic/taskmodels/pos/final-model.pt")

Once the model is loaded, you can use it to your properly tokenized sentence as follows

from flair.models import SequenceTagger
from flair.data import Sentence
classifier = SequenceTagger.load(am_pos_model)

# create example sentence
sentence = Sentence('አበበ ብዙ በሶ በላ ።')
# predict class and print
classifier.predict(sentence)

print(sentence.to_tagged_string())

You should see the following result


NER

We have published the best model, which was based on FLAIR embedding to the LT group data repository. To use the Amharic named entity recognizer, run the following script

  • First download the model from here. Or run the following code

pip install wget

am_ner_model = wget.download("http://ltdata1.informatik.uni-hamburg.de/amharic/taskmodels/ner/final-model.pt")

  • make sure you have flair install in your environment

pip install flair

  • Use the model as follows
from flair.data import Sentence
from flair.models import SequenceTagger


# load the model you trained
model = SequenceTagger.load(am_ner_model)
# or if you manually download the final-model.pt file do it as follows, make sure to put the correct path
# model = SequenceTagger.load('final-model.pt')


# create example sentence
sentence = Sentence('አበበ በሶ በለ ።')

# predict tags and print
model.predict(sentence)

print(sentence.to_tagged_string())

You should see the following output, detecting the person's name correctly.


Question Classification

The Amharic question classification model based on the FLAIR, AmRoBERTa, and XLMR contextual embeddings is released at the LT group data and model repository. The QA model trained with the AmRoBERTa embedding is better compared to others. See the results from our paper.

You can download the model from here


Transliteration to Amharic Fidel

Transliteration is a process of converting ASCII represented Amharic texts back to the canonical Amharic letter representations (which are known as Ethiopic or Fidäl scripts).

For example,

zare sint ken new?

#can be transliterated to its Ethiopic representation as 

ዛሬ ስንት ቀን ነው?

To transliterate Latic script to Amharic Fidel, Use our translitrator from here.
or The following code show cases how to transliterate a given Latin script text to Amharic Fidel script text.

pip install amseg

from amseg.amharicTranslitrator import AmharicTranslitrator as  transliterator
transliterated = transliterator.transliterate('misa belah')

# You should see the following output

 transliterated = ‘ሚሳ በላህ’

Machine Translation

Similarity

Hate Speech