Skip to content

Latest commit

 

History

History
103 lines (78 loc) · 4.52 KB

README.md

File metadata and controls

103 lines (78 loc) · 4.52 KB

rich-regular-morphology

Contemplations and resolutions for rule-based morphological descriptions of languages with rich and regular morphology with examples from Uralic, Salish and Maipurean languages

Documentation

  • Rueter_Hoimupaevad-2019-10-11_Tallinn.pdf These are the slides from a presentation in Tallinn 2019-10-11 Morphological analyzers and other digital tools for Uralic languages

Background

Searching for a methodology for linguistics

  • (1) Extract paradigms from grammars, readers and research to build an analyzer.
  • (2) Extract words, part-of-speech information and definitions from existing dictionaries and research. Build on what has already been done (Dutch, French, German, Russian,…)
  • (3) Test analysis coverage on written texts. Are the forms unrecognized proper words?
  • (4) Disambiguate morphological analyses based on grammars and research. Point out gaps in descriptions
  • (5) Test syntactic disambiguation on example sentences cited in grammatical descriptions of the language. And then retest on text corpora.
  • (6) Make disambiguated sentences public, so others can test. One by-product of these golden standards are treebanks.
  • (7) Use all phases to benefit the speaker and research community

Open-source morphological descriptions for Uralic languages

  • First transducers of minority Uralic languages after Finnish 1983 (Kimmo Koskenniemi)

  • Meadow Mari ~1986 (Jorma Luutonen)

  • Komi-Zyrian 1996 (Jack Rueter)

  • Giellatekno ~2000 begins work with Sami descriptions (Trond Trosterud et al) Barents Sea languages, Circum Polar languages ~2004-> other Uralic languages

Minority Uralic language forms with finite-state morphology development

  • Balto-Finnic: fit = Meänkieli, fkv = Kveen, izh = Ingrian, krl = Karelian, liv = Livonian, olo = Olonets-Karelian aka Livvi, vep = Veps, vot = Votic, vro = Võro
  • Sami: sjd = Kildin Sami, sje = Pite Sami, sma = South Sami, sme = Northern Sami, smj = Lule Sami, smn = Inari Sami, sms = Skolt Sami
  • Mordvin: mdf = Moksha, myv = Erzya
  • Mari: mhr = Meadow & Eastern Mari, mrj = Hill Mari aka Western Mari
  • Permic: koi = Komi-Permyak, kpv = Komi-Zyrian, udm = Udmurt
  • Ob Ugrian: kca = Khanty, mns = Mansi
  • Samoyedic: nio = Nganasan , sel = Selkup, yrk = Nenets

Pertinent majority languages with finite-state morphology development

  • Uralic languages in majority: est = Estonian, fin = Finnish, hun = Hungarian
  • Auxilliary languages: deu = German, lav = Latvian, nob = Norwegian Bokmål, rus = Russian, tat = Tatar

Setting up a Morphological analyzer

  • Find a source and use the known morphological information
  • Find or build a lexicon to propogate this word type

...

Tools

  • Keyboards Giellalt/
  • Spell checkers: Hunspell, Voikko
  • Click-in-text dictionaries
  • Language learning
  • Text-to-speech
  • Translation

Giella Dictionaries

Click-in-text dictionaries, Giella:

Language internal and external links in multilingual dictionaries, Akusanat:

Material Collaborations: FU-Lab, University of Turku, University of Tartu, University, EKI, University of Vienna, Livones, Võro Instituut

Intelligent Computer Assisted Language Learning

Translation

Text-to-speech

Infrastructures with intense activity