feat: run OCR extraction on every new image #543

raphael0202 · 2024-10-30T16:44:04Z

Closes #320

The parsed host list was ['"localhost', '127.0.0.1"']

raphodn · 2024-11-04T13:52:44Z

open_prices/proofs/utils.py

+    data["created_at"] = int(time.time())
+
+    with gzip.open(ocr_json_path, "wt") as f:
+        f.write(json.dumps(data))


so it stores the result in a jsonl.gz file next to the image ?

if we re-run on the same image, will it override ?

so it stores the result in a jsonl.gz file next to the image ?

Yes exactly!

if we re-run on the same image, will it override ?

It depends on the value of override

ah ouiii bien vu !

albertaillet · 2024-11-06T13:20:22Z

In this script openfoodfacts-server/blob/main/scripts/run_ocr.py you also run

"features": [
    {"type": "TEXT_DETECTION"},
    {"type": "LOGO_DETECTION"},
    {"type": "LABEL_DETECTION"},
    {"type": "SAFE_SEARCH_DETECTION"},
    {"type": "FACE_DETECTION"},
],

not only TEXT_DETECTION, would this be of interest here as well?

raphodn · 2024-11-06T15:30:46Z

todo : avoid running OCR extraction in the testruns + delete test image ?

raphodn · 2024-11-06T19:13:41Z

open_prices/api/proofs/views.py

@@ -75,6 +77,10 @@ def upload(self, request: Request) -> Response:
                status=status.HTTP_400_BAD_REQUEST,
            )
        file_path, mimetype, image_thumb_path = store_file(request.data.get("file"))
+        async_task(


I think I'm going to refactor this bit to add it to the post_save signal instead
similar to what is done with locations (OSM) & products (OFF)

done in #549

raphael0202 added 2 commits October 30, 2024 17:39

fix: fix incorrect ALLOWED_HOSTS value

2f82256

The parsed host list was ['"localhost', '127.0.0.1"']

feat: run OCR on every new image

e17c351

github-actions bot assigned raphael0202 Oct 30, 2024

github-actions bot added the GitHub Actions Pull requests that update GitHub Actions code label Oct 30, 2024

raphael0202 requested a review from raphodn October 30, 2024 16:44

raphael0202 added the OCR label Oct 30, 2024

raphael0202 merged commit 77ed50b into main Nov 4, 2024
10 checks passed

raphael0202 deleted the run-ocr-extraction branch November 4, 2024 12:59

openfoodfacts-bot mentioned this pull request Nov 4, 2024

chore(main): release 1.47.0 #542

Merged

raphodn reviewed Nov 4, 2024

View reviewed changes

raphodn reviewed Nov 6, 2024

View reviewed changes

raphodn mentioned this pull request Nov 6, 2024

refactor(proofs): run OCR in post_save signal instead of create #549

Merged

albertaillet mentioned this pull request Nov 14, 2024

refactor(proofs): more features to OCR extraction #566

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: run OCR extraction on every new image #543

feat: run OCR extraction on every new image #543

raphael0202 commented Oct 30, 2024 •

edited by raphodn

Loading

raphodn Nov 4, 2024

raphodn Nov 4, 2024

raphael0202 Nov 4, 2024 •

edited

Loading

raphael0202 Nov 4, 2024

raphodn Nov 4, 2024

albertaillet commented Nov 6, 2024

raphodn commented Nov 6, 2024 •

edited

Loading

raphodn Nov 6, 2024 •

edited

Loading

raphodn Nov 6, 2024

feat: run OCR extraction on every new image #543

feat: run OCR extraction on every new image #543

Conversation

raphael0202 commented Oct 30, 2024 • edited by raphodn Loading

raphodn Nov 4, 2024

Choose a reason for hiding this comment

raphodn Nov 4, 2024

Choose a reason for hiding this comment

raphael0202 Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

raphael0202 Nov 4, 2024

Choose a reason for hiding this comment

raphodn Nov 4, 2024

Choose a reason for hiding this comment

albertaillet commented Nov 6, 2024

raphodn commented Nov 6, 2024 • edited Loading

raphodn Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

raphodn Nov 6, 2024

Choose a reason for hiding this comment

raphael0202 commented Oct 30, 2024 •

edited by raphodn

Loading

raphael0202 Nov 4, 2024 •

edited

Loading

raphodn commented Nov 6, 2024 •

edited

Loading

raphodn Nov 6, 2024 •

edited

Loading