You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems we have an increasing number of duplicated images in Open Food Facts database: images with different binary content, but that are however almost identical.
We want to detect these almost identical images to remove them. It enables us to save disk space and make the work of contributors easier.
Proposed solution
Use fingerprinting techniques to assign a single hash to each image. See this blog post for more information about image fingerprinting. Explore the recall/precision trade-off for each hashing techniques.
The documentation of the undouble library is available here.
You can download all images of a selected subset of products using Open Food Facts Images dataset, and detect quasi-similar images. A manual analysis of results should be performed to assess which technique is the most robust for our use case and the precision/recall/accuracy metrics.
Discussed in #1201
Originally posted by raphael0202 May 23, 2023
Problem
It seems we have an increasing number of duplicated images in Open Food Facts database: images with different binary content, but that are however almost identical.
Example:
We want to detect these almost identical images to remove them. It enables us to save disk space and make the work of contributors easier.
Proposed solution
Use fingerprinting techniques to assign a single hash to each image. See this blog post for more information about image fingerprinting. Explore the recall/precision trade-off for each hashing techniques.
The documentation of the
undouble
library is available here.You can download all images of a selected subset of products using Open Food Facts Images dataset, and detect quasi-similar images. A manual analysis of results should be performed to assess which technique is the most robust for our use case and the precision/recall/accuracy metrics.
Related issues
openfoodfacts/openfoodfacts-server#8445
The text was updated successfully, but these errors were encountered: