Skip to content

Latest commit

 

History

History

baseline

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Baseline system

This is a very simple baseline system that uses a weighted finite-state transducer to try and determine the most likely set of morphs for a particular token.

It works by using the frequency of morphs observed in the training set and trying to maximise the frequency of the set of morphs for new tokens.

You can train the system using:

$ python3 train.py ../data/train.tsv model.att
1529 morphemes written

And then use the system by running:

$ cat ../data/dev.tsv | python3 segment.py model.att > output.tsv

You can use the evaluation script like this:

$ python3 ../evaluate.py ../data/dev.tsv output.tsv 
25 sentences read, 163 tokens
P: 0.824642126789366
R: 0.8036371603856266
F: 0.8108968740870581