normalize() function in TurkishTextNormalizer #266

Baris000-eng · 2021-06-28T17:48:02Z

Hi,
I do not understand why we take the first three letters of the string if it has no formal analysis and if its' length is greater than 3. If you elaborate on that, I will be thankful,
Best,
Barış

Baris000-eng · 2021-06-28T18:40:57Z

if ((analyses.analysisCount() == 0) && current.length() > 3) {

    List<String> spellCandidates = spellChecker
        .suggestForWord(current, previous, next, lm);
    if (spellCandidates.size() > 3) {
      spellCandidates = new ArrayList<>(spellCandidates.subList(0, 3));
    }
    candidates.addAll(spellCandidates);
  }

This part of code is the part that I am asking

ahmetaa · 2021-06-29T15:31:45Z

Well I think it was because spell checker generates too much candidates for short words and they are not reliable. But feel free to change this and try if it works for you. Unfortunately text normalization code was not really production ready

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

normalize() function in TurkishTextNormalizer #266

normalize() function in TurkishTextNormalizer #266

Baris000-eng commented Jun 28, 2021

Baris000-eng commented Jun 28, 2021

ahmetaa commented Jun 29, 2021

normalize() function in TurkishTextNormalizer #266

normalize() function in TurkishTextNormalizer #266

Comments

Baris000-eng commented Jun 28, 2021

Baris000-eng commented Jun 28, 2021

ahmetaa commented Jun 29, 2021