-
-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: improve parsing of 'category (type 1, type 2..)' ingredients #10999
Conversation
/update_tests_results |
…/openfoodfacts-server into category-types-ingredients
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a review but it's a bit cryptic to me in the hard parts, so I will believe the tests !
@@ -174,6 +174,10 @@ my $separators_except_comma = qr/(;|:|$middle_dot|\[|\{|\(|\N{U+FF08}|( $dashes | |||
|
|||
my $separators = qr/($stops\s|$commas|$separators_except_comma)/i; | |||
|
|||
# Symbols to indicate labels like organic, fairtrade etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Symbols to indicate labels like organic, fairtrade etc. | |
# Symbols to indicate labels like organic, fairtrade etc. | |
# like in "pomodoro*, oignons*. (* indicates organic ingredients)" |
|
||
my %percent_or_quantity_regexps = (); | ||
|
||
sub init_percent_or_quantity_regexps($ingredients_lc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great you separated that !
if ($ingredients_lc eq "en") { | ||
$ingredient =~ s/(?:organic |fair trade )*//ig; | ||
} | ||
elsif ($ingredients_lc eq "fr") { | ||
$ingredient =~ s/(?: bio| biologique| équitable|s|\s|' . $symbols_regexp . ')//ig; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(future improvement) Maybe we could use a "used_in_ingredients" property in labels taxonomy to get them.
It's a bit of a pity we don't find this in german etc.
At least we could use taxonomy entries for organic and fair trade ?
🤖 I have created a release *beep* *boop* --- ## [2.51.0](v2.50.0...v2.51.0) (2024-12-10) ### Features * Add script to remove nearly empty products with quality issues ([#11058](#11058)) ([82726d5](82726d5)) * NOVA 4 attribute and knowledge panel improvements ([#11035](#11035)) ([9048011](9048011)) ### Bug Fixes * additives table + clean HTML to remove some validation errors ([#11093](#11093)) ([474f68d](474f68d)) * avoid crash if ingredients services called without ingredients_lc ([#11055](#11055)) ([1db3e94](1db3e94)) * data quality, false positive, nutrition sum with lower symbol ([#11076](#11076)) ([d389c87](d389c87)) * data quality, false positive, nutrition sum with lower symbol for milk below the table ([#11098](#11098)) ([7febb69](7febb69)) * display of usage in scripts/import_csv_file.pl ([#11091](#11091)) ([91881f8](91881f8)) * improve parsing of 'category (type 1, type 2..)' ingredients ([#10999](#10999)) ([42618ac](42618ac)) * letter A at end of string is not a stopword ([#11095](#11095)) ([6eaeb26](6eaeb26)) * Load products in mongodb ([#11072](#11072)) ([6787ba1](6787ba1)) * new images path ([#11096](#11096)) ([8658959](8658959)) * pro platform product writes to the public platform MongoDB database ([#11065](#11065)) ([f77eb82](f77eb82)) * product image move [#11067](#11067) ([#11092](#11092)) ([30257c1](30257c1)) * remove warning in ecobalyse matching of ingredients ([#11062](#11062)) ([c29fce9](c29fce9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
PR to better handle things like "vegetal oil (palm, rapeseed)":
Work in progress, some tests will need to be updated.