Skip to content

Latest commit

 

History

History
30 lines (22 loc) · 763 Bytes

README.md

File metadata and controls

30 lines (22 loc) · 763 Bytes

Baseline system

This is a very simple baseline system that uses a weighted finite-state transducer to try and determine the most likely set of morphs for a particular token.

It works by using the frequency of morphs observed in the training set and trying to maximise the frequency of the set of morphs for new tokens.

You can train the system using:

$ python3 train.py ../data/train.tsv model.att
1529 morphemes written

And then use the system by running:

$ cat ../data/dev.tsv | python3 segment.py model.att > output.tsv

You can use the evaluation script like this:

$ python3 ../evaluate.py ../data/dev.tsv output.tsv 
25 sentences read, 163 tokens
P: 0.824642126789366
R: 0.8036371603856266
F: 0.8108968740870581