You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to use Greynir-Correct for correction of non-whole sentences, i.e. in extreme cases single words. What method or options should I use to make that possible ?
Currently, when using the tokenize() method with option only_ci=True, it complains about the following:
Maðurin Z002 Orð á að byrja á hástaf: 'maðurin'
Maðurinn Z002 Orð á að byrja á hástaf: 'maðurinn'
Interesting question, and this may well be a use case that we should support better. As is, the code is mostly oriented towards review of continuous text, typically whole sentences.
The code that checks the spelling of a single token is basically around this line. The call to spelling.Corrector.correct() can optionally be provided with a context, i.e. preceding tokens that will then be used to adjust the correction probabilities based on a trigram language model.
See also the short test function at the bottom of spelling.py.
At least the documentation of tokenize() doesn't state assumptions about the text structure in contrast to the documentation of the methods check() or check_single(). Yes this use case exists e.g. for spell checking of web input forms, where often only single words or short text terms are entered.
I want to use Greynir-Correct for correction of non-whole sentences, i.e. in extreme cases single words. What method or options should I use to make that possible ?
Currently, when using the
tokenize()
method with optiononly_ci=True
, it complains about the following:Sample code:
The text was updated successfully, but these errors were encountered: