Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Synonyms in taxonomized suggestions #9395

Merged
merged 31 commits into from
Feb 19, 2024
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
48fd9a9
Prototype synonyms in taxonomized suggestions
Naruyoko Nov 24, 2023
a357410
Make searching work with language code
Naruyoko Nov 24, 2023
a1fd46b
API is working hard to give 25 suggestions by default. Why not displa…
Naruyoko Nov 24, 2023
2946868
Ignore `get_synonyms` option when caching
Naruyoko Nov 28, 2023
9546996
Send synonyms before normalization
Naruyoko Dec 3, 2023
815974d
Do not show twice if the synonym is canonical
Naruyoko Dec 3, 2023
7ee498d
Prefer earlier synonyms
Naruyoko Dec 4, 2023
6c6bcb3
Reword the description for `matched_synonyms`
Naruyoko Dec 4, 2023
882974e
Fix display when matched = canonical with LC
Naruyoko Dec 4, 2023
75b607d
Apply linter
Naruyoko Dec 11, 2023
c721905
Fix JS linting issues
Naruyoko Dec 11, 2023
32d7882
Add documentation for `get_synonyms` option
Naruyoko Dec 11, 2023
822bc31
Update tests
Naruyoko Dec 18, 2023
6138e9e
Fix display of the canonical name with LC
Naruyoko Dec 25, 2023
b4da777
Merge remote-tracking branch 'origin/main' into temp
Naruyoko Dec 25, 2023
8047944
Update tests
Naruyoko Dec 25, 2023
5c28d4b
Remove a commented out debug log
Naruyoko Dec 25, 2023
9f29d6b
Update tests
Naruyoko Jan 4, 2024
086a0a0
Merge remote-tracking branch 'origin/main' into Naruyoko-synonym-auto…
Naruyoko Jan 4, 2024
eaee004
Revert "Update tests"
Naruyoko Jan 23, 2024
51da77a
Revert "Update tests"
Naruyoko Jan 23, 2024
aa89208
Partially revert "Update tests" (suggest.pl)
Naruyoko Jan 23, 2024
07ae844
Preserve the behavior of suggest.pl API (new functionalities in new f…
Naruyoko Jan 24, 2024
9c73b06
Merge remote-tracking branch 'origin/main' into Naruyoko-synonym-auto…
Naruyoko Jan 24, 2024
4b75874
Run Perl linter
Naruyoko Jan 24, 2024
09daf5f
Change the format of matched synonyms to a dictionary
Naruyoko Feb 9, 2024
e86988d
Update docs with the new format
Naruyoko Feb 11, 2024
aa5444c
Add tests for get_taxonomy_suggestions_with_synonyms (by copying)
Naruyoko Feb 17, 2024
12c20c3
Run Perl linter on the new test
Naruyoko Feb 17, 2024
d1742d1
Merge remote-tracking branch 'origin/main' into Naruyoko-synonym-auto…
Naruyoko Feb 17, 2024
f779d0a
docs: minor update on suggest api
alexgarel Feb 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 15 additions & 5 deletions docs/api/ref/api-v3.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ x-stoplight:
info:
title: Open Food Facts Open API V3 - under development
description: |
As a developer, the Open Food Facts API allows you to get information
and contribute to the products database. You can create great apps to
As a developer, the Open Food Facts API allows you to get information
and contribute to the products database. You can create great apps to
help people make better food choices and also provide data to enhance the database.
termsOfService: 'https://openweathermap.org/terms'
contact:
Expand Down Expand Up @@ -50,12 +50,12 @@ paths:
in: query
name: fields
description: |-
Comma separated list of fields requested in the response.
Comma separated list of fields requested in the response.

Special values:
Special values:
* "none": returns no fields
* "raw": returns all fields as stored internally in the database
* "all": returns all fields except generated fields that need to be explicitly requested such as "knowledge_panels".
* "all": returns all fields except generated fields that need to be explicitly requested such as "knowledge_panels".

Defaults to "all" for READ requests. The "all" value can also be combined with fields like "attribute_groups" and "knowledge_panels".'
responses:
Expand Down Expand Up @@ -239,6 +239,11 @@ paths:
description: Array of sorted strings suggestions in the language requested in the "lc" field.
items:
type: string
matched_synonyms:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what I undertand seeing tests, you did not only change api-v3 but also api-v2 (in api.yml). shan't you change it there also ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment might not be relevant if we go for a suggestion api v2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure where I should separate my changes. Do I copy this part with a new name with the new function exposed only there? If so, do I need to copy this function as well? Or do the changes need to be separated at the level of inner functions, like from here?

I am also not sure how I should change api.yml, since it appears mostly empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naruyoko in the code you show, condition should be enough to sort out the cases.

But we could add a parameter to get_taxonomy_suggestions and have a cgi/suggest_v2.pl alike suggest.pl

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naruyoko, do not hesitate to ping me on slack if it's not clear enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using dict, do you mean like this (which was what I intended by "traspose"):

{"errors":[],"suggestions":[{"matched_synonym":"Orge","tag":"Gluten"},{"matched_synonym":"Œufs","tag":"Œufs"},{"matched_synonym":"Arachis hypogaea","tag":"Arachides"},{"matched_synonym":"Langoustine","tag":"Crustacés"},{"matched_synonym":"Fruits à coque dure","tag":"Fruits à coque"},{"matched_synonym":"Beurre concentré","tag":"Lait"},{"matched_synonym":"Pétoncles","tag":"Mollusques"},{"matched_synonym":"Graines de moutarde","tag":"Moutarde"},{"matched_synonym":"Rouget","tag":"Poisson"},{"matched_synonym":"Tofu","tag":"Soja"}],"warnings":[]}

Or like this:

{"errors":[],"suggestions":{"Gluten":"Orge","Œufs":"Œufs","Arachides":"Arachis hypogaea","Crustacés":"Langoustine","Fruits à coque":"Fruits à coque dure","Lait":"Beurre concentré","Mollusques":"Pétoncles","Moutarde":"Graines de moutarde","Poisson":"Rouget","Soja":"Tofu"},"warnings":[]}

Or like this:

{"errors":[],"matched_synonyms":{"Gluten":"Orge","Œufs":"Œufs","Arachides":"Arachis hypogaea","Crustacés":"Langoustine","Fruits à coque":"Fruits à coque dure","Lait":"Beurre concentré","Mollusques":"Pétoncles","Moutarde":"Graines de moutarde","Poisson":"Rouget","Soja":"Tofu"},"status":"success","suggestions":["Gluten","Œufs","Arachides","Crustacés","Fruits à coque","Lait","Mollusques","Moutarde","Poisson","Soja"],"warnings":[]}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your third version, because it keeps things simple as you can just retrieve the simple suggestion list, and it's easy to find back matched synonym.

Just one point: shan't it be "matched_synonyms":{"Gluten":["Orge"],…} as there might be more than one matched synonym (or don't we care because we just return one ? Which is still really ok for me, as it covers the needs).

Also is it useful to put entries like "Œufs": "Œufs" inside matched_synonyms or can the API user deduce from the fact that Œufs is not in the synonyms dict that it means it was directly matched ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with using the third version.

The matched synonym is currently the first matched one of the best match (with "start">"inside">"fuzzy"). I don't see how much useful it is to return all possible matches.

I think it is better to keep "a":"a" entries for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a change with the new format. I will have to update the API docs and create new tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I read your changes, seems perfect 💪

So I wait for the other changes. Don't hesitate to ping me on slack when you are ready @Naruyoko :-)

type: object
description: Dictionary of strings associating canonical names (as seen in suggestions field) with the synonym that best matches the query. An entry is present for all suggestions, even when the synonym is the same with the canonical name.
alexgarel marked this conversation as resolved.
Show resolved Hide resolved
additional_properties:
type: string
operationId: get-api-v3-taxonomy_suggestions-taxonomy
description: |-
Open Food Facts uses multilingual [taxonomies](https://wiki.openfoodfacts.org/Global_taxonomies) to normalize entries for categories, labels, ingredients, packaging shapes / materials / recycling instructions and many more fields.
Expand Down Expand Up @@ -282,6 +287,11 @@ paths:
in: query
name: limit
description: 'Maximum number of suggestions. Default is 25, max is 400.'
- schema:
type: string
in: query
name: get_synonyms
description: 'Whether or not to include "matched_synonyms" in the response. Set to 1 to include.'
- schema:
type: string
in: query
Expand Down
47 changes: 43 additions & 4 deletions html/js/product-multilingual.js
Original file line number Diff line number Diff line change
Expand Up @@ -542,14 +542,22 @@ function initializeTagifyInput(el) {
autocomplete: true,
whitelist: get_recents(el.id) || [],
dropdown: {
enabled: 0
enabled: 0,
maxItems: 100
}
});

let abortController;
let debounceTimer;
const timeoutWait = 300;

function updateSuggestions() {
const value = input.state.inputText;
const lc = (/^\w\w:/).exec(value);
const term = lc ? value.substring(lc[0].length) : value;
input.dropdown.show(term);
}

input.on("input", function (event) {
const value = event.detail.value;
input.whitelist = null; // reset the whitelist
Expand All @@ -565,16 +573,47 @@ function initializeTagifyInput(el) {

abortController = new AbortController();

fetch(el.dataset.autocomplete + "&string=" + value, {
fetch(el.dataset.autocomplete + "&string=" + value + "&get_synonyms=1", {
signal: abortController.signal
}).
then((RES) => RES.json()).
then(function (json) {
input.whitelist = json.suggestions;
input.dropdown.show(value); // render the suggestions dropdown
const lc = (/^\w\w:/).exec(value);
let whitelist = Object.values(json.matched_synonyms);
if (lc) {
whitelist = whitelist.map(function (e) {
return {"value": lc + e, "searchBy": e};
});
}
const synonymMap = Object.create(null);
// eslint-disable-next-line guard-for-in
for (const k in json.matched_synonyms) {
synonymMap[json.matched_synonyms[k]] = k;
}
input.synonymMap = synonymMap;
input.whitelist = whitelist;
updateSuggestions(); // render the suggestions dropdown
});
}, timeoutWait);
}
updateSuggestions();
});

input.on("dropdown:show", function() {
if (!input.synonymMap) {
return;
}
$(input.DOM.dropdown).find("div.tagify__dropdown__item").each(function(_,e) {
let synonymName = e.getAttribute("value");
const lc = (/^\w\w:/).exec(synonymName);
if (lc) {
synonymName = synonymName.substring(3);
}
const canonicalName = input.synonymMap[synonymName];
if (canonicalName && canonicalName !== synonymName) {
e.innerHTML += " (&rarr; <i>" + canonicalName + "</i>)";
}
});
});

input.on("add", function (event) {
Expand Down
20 changes: 16 additions & 4 deletions lib/ProductOpener/APITaxonomySuggestions.pm
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,10 @@ sub taxonomy_suggestions_api ($request_ref) {
};

# Options define how many suggestions should be returned, in which format etc.
my $options_ref = {limit => request_param($request_ref, 'limit')};
my $options_ref = {
limit => request_param($request_ref, 'limit'),
get_synonyms => request_param($request_ref, 'get_synonyms')
};

# Validate input parameters

Expand Down Expand Up @@ -122,9 +125,18 @@ sub taxonomy_suggestions_api ($request_ref) {
}
# Generate suggestions
else {

$response_ref->{suggestions}
= [get_taxonomy_suggestions($tagtype, $search_lc, $string, $context_ref, $options_ref)];
my $options_relavant = {%$options_ref};
delete $options_relavant->{get_synonyms};
my @suggestions
= get_taxonomy_suggestions_with_synonyms($tagtype, $search_lc, $string, $context_ref, $options_relavant);
$log->debug("taxonomy_suggestions_api", @suggestions) if $log->is_debug();
$response_ref->{suggestions} = [map {$_->{tag}} @suggestions];
if ($options_ref->{get_synonyms}) {
$response_ref->{matched_synonyms} = {};
foreach (@suggestions) {
$response_ref->{matched_synonyms}->{$_->{tag}} = ucfirst($_->{matched_synonym});
}
}
}

$log->debug("taxonomy_suggestions_api - stop", {request => $request_ref}) if $log->is_debug();
Expand Down
77 changes: 55 additions & 22 deletions lib/ProductOpener/TaxonomySuggestions.pm
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ use Log::Any qw($log);
BEGIN {
use vars qw(@ISA @EXPORT_OK %EXPORT_TAGS);
@EXPORT_OK = qw(
&get_taxonomy_suggestions_with_synonyms
&get_taxonomy_suggestions
); # symbols to export on request
%EXPORT_TAGS = (all => [@EXPORT_OK]);
Expand Down Expand Up @@ -80,9 +81,13 @@ sub load_categories_packagings_stats_for_suggestions() {
return $categories_packagings_stats_for_suggestions_ref;
}

=head2 get_taxonomy_suggestions_with_synonyms ($tagtype, $search_lc, $string, $context_ref, $options_ref )

Generate taxonomy suggestions with matched synonyms information.

=head2 get_taxonomy_suggestions ($tagtype, $search_lc, $string, $context_ref, $options_ref )

Generate taxonomy suggestions.
Generate taxonomy suggestions (without matched synonyms information).

=head3 Parameters

Expand All @@ -107,7 +112,7 @@ Restart memcached if you want fresh results (e.g. when taxonomy are category sta

=cut

sub get_taxonomy_suggestions ($tagtype, $search_lc, $string, $context_ref, $options_ref) {
sub get_taxonomy_suggestions_with_synonyms ($tagtype, $search_lc, $string, $context_ref, $options_ref) {

$log->debug(
"get_taxonomy_suggestions - start",
Expand Down Expand Up @@ -139,7 +144,8 @@ sub get_taxonomy_suggestions ($tagtype, $search_lc, $string, $context_ref, $opti

my @tags = generate_sorted_list_of_taxonomy_entries($tagtype, $search_lc, $context_ref);

my @filtered_tags = filter_suggestions_matching_string(\@tags, $tagtype, $search_lc, $string, $options_ref);
my @filtered_tags
= filter_suggestions_matching_string_with_synonyms(\@tags, $tagtype, $search_lc, $string, $options_ref);
$results_ref = \@filtered_tags;

$log->debug("storing suggestions in cache", {key => $key}) if $log->is_debug();
Expand All @@ -152,6 +158,12 @@ sub get_taxonomy_suggestions ($tagtype, $search_lc, $string, $context_ref, $opti
return @$results_ref;
}

sub get_taxonomy_suggestions ($tagtype, $search_lc, $string, $context_ref, $options_ref) {
return
map {$_->{tag}}
get_taxonomy_suggestions_with_synonyms($tagtype, $search_lc, $string, $context_ref, $options_ref);
}

=head2 generate_sorted_list_of_taxonomy_entries($tagtype, $search_lc, $context_ref)

Generate a sorted list of canonicalized taxonomy entries from which we will generate suggestions
Expand Down Expand Up @@ -312,28 +324,39 @@ sub match_stringids ($stringid, $fuzzystringid, $synonymid) {

# best_match is used to see how well matches the best matching synonym

sub best_match ($stringid, $fuzzystringid, $synonyms_ids_ref) {
sub best_match ($search_lc, $stringid, $fuzzystringid, $synonyms_ref) {

my $best_match = "none";
my $best_type = "none";
my $best_match = 0;

foreach my $synonymid (@$synonyms_ids_ref) {
foreach my $synonym (@$synonyms_ref) {
my $synonymid = get_string_id_for_lang($search_lc, $synonym);
my $match = match_stringids($stringid, $fuzzystringid, $synonymid);
# Prefer to use the earlier ones from the list for when the canonical name has the same match type as a synonym
next if $match eq "none" or $match eq $best_type;
if ($match eq "start") {
# Best match, we can return without looking at the other synonyms
return "start";
$best_type = $match;
$best_match = $synonym;
last;
}
elsif (($match eq "inside")
or (($match eq "fuzzy") and ($best_match eq "none")))
or (($match eq "fuzzy") and ($best_type eq "none")))
{
$best_match = $match;
$best_type = $match;
$best_match = $synonym;
}
}
return $best_match;
return {type => $best_type, match => $best_match};
}

=head2 filter_suggestions_matching_string_with_synonyms ($tags_ref, $tagtype, $search_lc, $string, $options_ref)

Filter a list of potential taxonomy suggestions matching a string with matched synonyms information.

=head2 filter_suggestions_matching_string ($tags_ref, $tagtype, $search_lc, $string, $options_ref)

Filter a list of potential taxonomy suggestions matching a string.
Filter a list of potential taxonomy suggestions matching a string (without matched synonyms information).

By priority, the function returns:
- taxonomy entries that match the input string at the beginning
Expand All @@ -357,7 +380,7 @@ By priority, the function returns:

=cut

sub filter_suggestions_matching_string ($tags_ref, $tagtype, $search_lc, $string, $options_ref) {
sub filter_suggestions_matching_string_with_synonyms ($tags_ref, $tagtype, $search_lc, $string, $options_ref) {

my $original_lc = $search_lc;

Expand Down Expand Up @@ -424,39 +447,43 @@ sub filter_suggestions_matching_string ($tags_ref, $tagtype, $search_lc, $string
my $tag_xx = display_taxonomy_tag("xx", $tagtype, $canon_tagid);

# Build a list of normalized synonyms in the search language and the wildcard xx: language
my @synonyms_ids = map {get_string_id_for_lang($search_lc, $_)} (
my @synonyms = (
@{deep_get(\%synonyms_for, $tagtype, $search_lc, get_string_id_for_lang($search_lc, $tag)) || []},
@{deep_get(\%synonyms_for, $tagtype, "xx", get_string_id_for_lang("xx", $tag_xx)) || []}
);

# check how well the synonyms match the input string
my $best_match = best_match($stringid, $fuzzystringid, \@synonyms_ids);
my $best_match = best_match($search_lc, $stringid, $fuzzystringid, \@synonyms);

$log->debug(
"synonyms_ids for canon_tagid",
"synonyms for canon_tagid",
{
tagtype => $tagtype,
canon_tagid => $canon_tagid,
tag => $tag,
synonym_ids => \@synonyms_ids,
synonyms => \@synonyms,
best_match => $best_match
}
) if $log->is_debug();

my $to_add = {
tag => $tag,
matched_synonym => $best_match->{match}
};
# matching at start, best matches
if ($best_match eq "start") {
push @suggestions, $tag;
if ($best_match->{type} eq "start") {
push @suggestions, $to_add;
# count matches at start so that we can return only if we have enough matches
$suggestions_count++;
last if $suggestions_count >= $limit;
}
# matching inside
elsif ($best_match eq "inside") {
push @suggestions_c, $tag;
elsif ($best_match->{type} eq "inside") {
push @suggestions_c, $to_add;
}
# fuzzy match
elsif ($best_match eq "fuzzy") {
push @suggestions_f, $tag;
elsif ($best_match->{type} eq "fuzzy") {
push @suggestions_f, $to_add;
}
}
}
Expand All @@ -475,4 +502,10 @@ sub filter_suggestions_matching_string ($tags_ref, $tagtype, $search_lc, $string
return @suggestions;
}

sub filter_suggestions_matching_string ($tags_ref, $tagtype, $search_lc, $string, $options_ref) {
return
map {$_->{tag}}
filter_suggestions_matching_string_with_synonyms($tags_ref, $tagtype, $search_lc, $string, $options_ref);
}

1;
6 changes: 6 additions & 0 deletions tests/integration/api_v3_taxonomy_suggestions.t
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,12 @@ my $tests_ref = [
path => '/api/v3/taxonomy_suggestions?tagtype=categories&string=Café&lc=fr',
expected_status_code => 200,
},
{
test_case => 'allergens-string-fr-o-get-synonyms',
method => 'GET',
path => '/api/v3/taxonomy_suggestions?tagtype=allergens&string=o&lc=fr&get_synonyms=1',
expected_status_code => 200,
},
# Packaging suggestions return most popular suggestions first
{
test_case => 'packaging-shapes',
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
{
"errors" : [],
"matched_synonyms" : {
"Arachides" : "Cacahouètes",
"Crustacés" : "Homard",
"Fruits à coque" : "Fruits à coque",
"Gluten" : "Orge",
"Lait" : "Lactose",
"Mollusques" : "Mollusques",
"Moutarde" : "Moutarde",
"Poisson" : "Poisson",
"Soja" : "Soja",
"Œufs" : "Œufs"
},
"status" : "success",
"suggestions" : [
"Gluten",
"Œufs",
"Arachides",
"Crustacés",
"Fruits à coque",
"Lait",
"Mollusques",
"Moutarde",
"Poisson",
"Soja"
],
"warnings" : []
}
Loading
Loading