Frequent OCR issue: cut words #10019
Labels
🧽 Data quality
https://wiki.openfoodfacts.org/Quality
ingredients
Spellcheck
Spellchecking ingredient list and product name to increase ingredient recognition.
Several ingredients' issues are due to bad OCR results with cut words: leci -thin, concen - trate, konzen - trat, Emul - gator, émul - sifiant, conser - vateur, Ascorbin - säure, Sonnenblu - menöl, natür - liches, etc. Eg. https://world.openfoodfacts.org/product/28389064/euka-menthol?rev=17
For example, if we search for "leci - thine" and its derivatives, we found 35 products with this issue.
I think there are thousands of cases like this. This request allows to detect many errors.
Spell checkers like Language tool detect these issues but do not provide suggestions to fix it.
The text was updated successfully, but these errors were encountered: