Skip to content

Neural morphology

Mika Hämäläinen edited this page May 14, 2021 · 9 revisions

UralicNLP can handle out-of-vocabulary words thanks to its new neural fallback functionality.

Requirements

Natas is needed for neural models

pip install natas

How to use neural fallback

Just pass neural_fallback=True to your favorite functions:

from uralicNLP import uralicApi
uralicApi.generate("koirailla+V+Act+Ind+Prs+Sg1", "fin", neural_fallback=True)
>> [('koirailen', 0.0)]
uralicApi.analyze("hörpähdin", "fin", neural_fallback=True)
>> [('hörpähtää+V+Act+Ind+Prt+Sg1', 0.0)]
uralicApi.lemmatize("nirhautan", "fin", neural_fallback=True)
>> ['nirhauttaa']

Data for training

If you are interested in training your own models, you can get all inflectional forms for a word by running the following:

from uralicNLP import uralicApi
uralicApi.get_all_forms("kissa", "N", "fin")

Just pass a lemma, its part of speech and language. Other possible arguments are descriptive=True (picks a descriptive or a normative FST), limit_forms=-1 (how many forms to generate) and filter_out=["#", "+Der", "+Cmp","+Err"] (the tags you do not want in the output)

Cite

Hämäläinen, M., Partanen, N., Rueter, J., & Alnajjar, K. (2021). Neural Morphology Dataset and Models for Multiple Languages, from the Large to the Endangered. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021)

Clone this wiki locally