Single word / part of sentence correction #9

lumpidu · 2021-01-13T15:01:06Z

I want to use Greynir-Correct for correction of non-whole sentences, i.e. in extreme cases single words. What method or options should I use to make that possible ?

Currently, when using the tokenize() method with option only_ci=True, it complains about the following:

Maðurin      Z002     Orð á að byrja á hástaf: 'maðurin'
Maðurinn     Z002     Orð á að byrja á hástaf: 'maðurinn'

Sample code:

from reynir_correct import tokenize

texts = ["maðurin", "maðurinn" ]

for t in texts:
    g = tokenize(t, only_ci=True)
    for t in g:
        if t.txt:
            print(f"{t.txt:12} {t.error_code:8} {t.error_description}")

The text was updated successfully, but these errors were encountered:

vthorsteinsson · 2021-01-13T18:31:57Z

Interesting question, and this may well be a use case that we should support better. As is, the code is mostly oriented towards review of continuous text, typically whole sentences.

The code that checks the spelling of a single token is basically around this line. The call to spelling.Corrector.correct() can optionally be provided with a context, i.e. preceding tokens that will then be used to adjust the correction probabilities based on a trigram language model.

See also the short test function at the bottom of spelling.py.

lumpidu · 2021-01-13T21:33:27Z

At least the documentation of tokenize() doesn't state assumptions about the text structure in contrast to the documentation of the methods check() or check_single(). Yes this use case exists e.g. for spell checking of web input forms, where often only single words or short text terms are entered.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single word / part of sentence correction #9

Single word / part of sentence correction #9

lumpidu commented Jan 13, 2021

vthorsteinsson commented Jan 13, 2021

lumpidu commented Jan 13, 2021

Single word / part of sentence correction #9

Single word / part of sentence correction #9

Comments

lumpidu commented Jan 13, 2021

vthorsteinsson commented Jan 13, 2021

lumpidu commented Jan 13, 2021