Detect near identical images #1271

raphael0202 · 2023-10-25T10:14:38Z

Discussed in #1201

^{Originally posted by raphael0202 May 23, 2023}

Problem

It seems we have an increasing number of duplicated images in Open Food Facts database: images with different binary content, but that are however almost identical.

Example:

We want to detect these almost identical images to remove them. It enables us to save disk space and make the work of contributors easier.

Proposed solution

Use fingerprinting techniques to assign a single hash to each image. See this blog post for more information about image fingerprinting. Explore the recall/precision trade-off for each hashing techniques.

The documentation of the undouble library is available here.

You can download all images of a selected subset of products using Open Food Facts Images dataset, and detect quasi-similar images. A manual analysis of results should be performed to assess which technique is the most robust for our use case and the precision/recall/accuracy metrics.

Related issues

openfoodfacts/openfoodfacts-server#8445

The text was updated successfully, but these errors were encountered:

raphael0202 added ✨ enhancement New feature or request Computer Vision ⭐ top issue Top issue. ⭐ top feature Top feature request. 🖼️ images labels Oct 25, 2023

teolemon added this to 🤖 Artificial Intelligence @ Open Food Facts Oct 25, 2023

github-project-automation bot moved this to Todo in 🤖 Artificial Intelligence @ Open Food Facts Oct 25, 2023

raphael0202 mentioned this issue Oct 25, 2023

feat: store fingerprint of all images #1272

Merged

github-actions bot removed ⭐ top issue Top issue. ⭐ top feature Top feature request. labels Oct 26, 2023

teolemon removed the ✨ enhancement New feature or request label Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect near identical images #1271

Detect near identical images #1271

raphael0202 commented Oct 25, 2023

Problem

Proposed solution

Related issues

Detect near identical images #1271

Detect near identical images #1271

Comments

raphael0202 commented Oct 25, 2023

Discussed in #1201

Problem

Proposed solution

Related issues