Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a quality check for number of ingredients #9732

Closed
aleene opened this issue Feb 2, 2024 · 2 comments · Fixed by #11152 or #11149
Closed

Create a quality check for number of ingredients #9732

aleene opened this issue Feb 2, 2024 · 2 comments · Fixed by #11152 or #11149
Assignees
Labels
🧽 Data quality https://wiki.openfoodfacts.org/Quality

Comments

@aleene
Copy link
Contributor

aleene commented Feb 2, 2024

Problem

When cleaning up the Mozzarella category, I noticed that some products have only two ingredients. The most important ingredient (rennet) was missing from the ingredient list. This products can easily be found by plotting the number of ingredients. The plot below shows the buffalo mozzarellas:
Screenshot 2024-02-02 at 09 36 28

Proposed solution

Define a minimum number of required ingredients in the taxonomy. Use this minimum to check products in corresponding category and raise a flag.

Additional context

At the moment there are few quality checks on ingredients. This feature could be an extension on the single ingredient products, which is essentially a maximum number of ingredients.

Number of products impacted

This check is mainly for products where the producer did not list all the ingredients. Hopefully these are not to many.

Time per product

If these are tagged we no longer have to look for them.

@aleene aleene added the 🧽 Data quality https://wiki.openfoodfacts.org/Quality label Feb 2, 2024
@stephanegigandet
Copy link
Contributor

Related code: lib/ProductOpener/DataQualityFood.pm

in the categories taxonomy, add a "minimum_number_of_ingredients:en: 3" property for the mozzarella entry.

run "make build_taxonomies", and then add a check in lib/ProductOpener/DataQualityFood.pm (see example for the related "en:ingredients-single-ingredient-from-category-missing" warning)

@Payne680
Copy link
Contributor

Am currently working on this issue

@teolemon teolemon moved this to To discuss and validate in 🍊 Open Food Facts Server issues Apr 23, 2024
@benbenben2 benbenben2 self-assigned this Dec 21, 2024
@benbenben2 benbenben2 moved this from To do to Needs review in 🧽 Ensuring Data Quality Dec 21, 2024
@github-project-automation github-project-automation bot moved this from Needs review to Done in 🧽 Ensuring Data Quality Jan 2, 2025
@github-project-automation github-project-automation bot moved this from To discuss and validate to Done in 🍊 Open Food Facts Server issues Jan 2, 2025
stephanegigandet pushed a commit that referenced this issue Jan 6, 2025
🤖 I have created a release *beep* *boop*
---


##
[2.53.0](v2.52.0...v2.53.0)
(2025-01-06)


### Features

* data-quality - minimum number of ingredients
([#11152](#11152))
([d7881d4](d7881d4)),
closes
[#9732](#9732)
* data-quality/apply-remove_insignificant_digits-for-nutriments
([#11147](#11147))
([a6df72f](a6df72f))
* Top categories for Open Products Facts
([2239473](2239473))
* Top categories for Open Products Facts
([#11171](#11171))
([2239473](2239473))


### Bug Fixes

* allow serving size to be hyphenated
([#11161](#11161))
([7c0df2d](7c0df2d))
* Correct indentation, so that CodeQL can work with the code
([#11166](#11166))
([0178ac2](0178ac2))
* data quality - increase threshold for comparison between fiber and its
subnutriments
([#11145](#11145))
([f0a2682](f0a2682))
* Delete html/images/lang/de/labels/halal.90x90.png
([#11183](#11183))
([80cf708](80cf708))
* environmental_score
([#11191](#11191))
([cbe221e](cbe221e))
* fix OPF PR labelling
([e708ae3](e708ae3))
* fix OPF PR labelling
([#11154](#11154))
([e708ae3](e708ae3))
* fixes for Green-Score
([#11155](#11155))
([7287d8b](7287d8b))
* green-score link
([#11146](#11146))
([abf858a](abf858a))
* nutriscore grade from category change for extra virgin olive oils
([#11156](#11156))
([32d58e0](32d58e0))
* rm nova drilldown field for beauty
([#11193](#11193))
([3f5b654](3f5b654))
* SonarCloud issues
([#11165](#11165))
([b84d545](b84d545))
* warnings in import_convert_carrefour_france
([#11189](#11189))
([4643e3a](4643e3a))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧽 Data quality https://wiki.openfoodfacts.org/Quality
Projects
Status: Done
4 participants