Baseline system

This is a very simple baseline system that uses a weighted finite-state transducer to try and determine the most likely set of morphs for a particular token.

It works by using the frequency of morphs observed in the training set and trying to maximise the frequency of the set of morphs for new tokens.

You can train the system using:

$ python3 train.py ../data/train.tsv model.att
1529 morphemes written

And then use the system by running:

$ cat ../data/dev.tsv | python3 segment.py model.att > output.tsv

You can use the evaluation script like this:

$ python3 ../evaluate.py ../data/dev.tsv output.tsv 
25 sentences read, 163 tokens
P: 0.824642126789366
R: 0.8036371603856266
F: 0.8108968740870581

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Baseline system

Files

README.md

Latest commit

History

README.md

File metadata and controls

Baseline system