Improve parsed-price accuracy #26

EtienneLamoureux · 2023-12-01T15:45:40Z

Situation

Prices are prefixed with the ¤ symbol. This symbol is not in the english training set of Tesseract and is read as a random character. When this character is read as a digit, it inflates the prices read by an order of magnitude, i.e. ¤900 becomes 2900.

Tasks

Experiment with heuristics to mitigate the issue
1. Thousands are always separated by a comma , and groups of digit are only up to 3 long
2. Only 1 digit is present before the comma , when the price is listed in kilo units K
3. Others

Results

The ¤ character doesn't inflate prices

The text was updated successfully, but these errors were encountered:

EtienneLamoureux added help wanted Extra attention is needed refactor Change or improvement to an existing functionality labels Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve parsed-price accuracy #26

Improve parsed-price accuracy #26

EtienneLamoureux commented Dec 1, 2023 •

edited

Loading

Improve parsed-price accuracy #26

Improve parsed-price accuracy #26

Comments

EtienneLamoureux commented Dec 1, 2023 • edited Loading

Situation

Tasks

Results

EtienneLamoureux commented Dec 1, 2023 •

edited

Loading