-
-
Notifications
You must be signed in to change notification settings - Fork 7
Neural morphology
UralicNLP can handle out-of-vocabulary words thanks to its new neural fallback functionality.
Natas is needed for neural models
pip install natas
Just pass neural_fallback=True to your favorite functions:
from uralicNLP import uralicApi
uralicApi.generate("koirailla+V+Act+Ind+Prs+Sg1", "fin", neural_fallback=True)
>> [('koirailen', 0.0)]
uralicApi.analyze("hörpähdin", "fin", neural_fallback=True)
>> [('hörpähtää+V+Act+Ind+Prt+Sg1', 0.0)]
uralicApi.lemmatize("nirhautan", "fin", neural_fallback=True)
>> ['nirhauttaa']
If you are interested in training your own models, you can get all inflectional forms for a word by running the following:
from uralicNLP import uralicApi
uralicApi.get_all_forms("kissa", "N", "fin")
Just pass a lemma, its part of speech and language. Other possible arguments are descriptive=True (picks a descriptive or a normative FST), limit_forms=-1 (how many forms to generate) and filter_out=["#", "+Der", "+Cmp","+Err"] (the tags you do not want in the output)
Hämäläinen, M., Partanen, N., Rueter, J., & Alnajjar, K. (2021). Neural Morphology Dataset and Models for Multiple Languages, from the Large to the Endangered. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021)
UralicNLP is an open-source Python library by Mika Hämäläinen