Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

normalize() function in TurkishTextNormalizer #266

Open
Baris000-eng opened this issue Jun 28, 2021 · 2 comments
Open

normalize() function in TurkishTextNormalizer #266

Baris000-eng opened this issue Jun 28, 2021 · 2 comments

Comments

@Baris000-eng
Copy link

Hi,
I do not understand why we take the first three letters of the string if it has no formal analysis and if its' length is greater than 3. If you elaborate on that, I will be thankful,
Best,
Barış

@Baris000-eng
Copy link
Author

if ((analyses.analysisCount() == 0) && current.length() > 3) {

    List<String> spellCandidates = spellChecker
        .suggestForWord(current, previous, next, lm);
    if (spellCandidates.size() > 3) {
      spellCandidates = new ArrayList<>(spellCandidates.subList(0, 3));
    }
    candidates.addAll(spellCandidates);
  }

This part of code is the part that I am asking

@ahmetaa
Copy link
Owner

ahmetaa commented Jun 29, 2021

Well I think it was because spell checker generates too much candidates for short words and they are not reliable. But feel free to change this and try if it works for you. Unfortunately text normalization code was not really production ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants