Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Synonyms in taxonomized suggestions #9395

Merged
merged 31 commits into from
Feb 19, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
48fd9a9
Prototype synonyms in taxonomized suggestions
Naruyoko Nov 24, 2023
a357410
Make searching work with language code
Naruyoko Nov 24, 2023
a1fd46b
API is working hard to give 25 suggestions by default. Why not displa…
Naruyoko Nov 24, 2023
2946868
Ignore `get_synonyms` option when caching
Naruyoko Nov 28, 2023
9546996
Send synonyms before normalization
Naruyoko Dec 3, 2023
815974d
Do not show twice if the synonym is canonical
Naruyoko Dec 3, 2023
7ee498d
Prefer earlier synonyms
Naruyoko Dec 4, 2023
6c6bcb3
Reword the description for `matched_synonyms`
Naruyoko Dec 4, 2023
882974e
Fix display when matched = canonical with LC
Naruyoko Dec 4, 2023
75b607d
Apply linter
Naruyoko Dec 11, 2023
c721905
Fix JS linting issues
Naruyoko Dec 11, 2023
32d7882
Add documentation for `get_synonyms` option
Naruyoko Dec 11, 2023
822bc31
Update tests
Naruyoko Dec 18, 2023
6138e9e
Fix display of the canonical name with LC
Naruyoko Dec 25, 2023
b4da777
Merge remote-tracking branch 'origin/main' into temp
Naruyoko Dec 25, 2023
8047944
Update tests
Naruyoko Dec 25, 2023
5c28d4b
Remove a commented out debug log
Naruyoko Dec 25, 2023
9f29d6b
Update tests
Naruyoko Jan 4, 2024
086a0a0
Merge remote-tracking branch 'origin/main' into Naruyoko-synonym-auto…
Naruyoko Jan 4, 2024
eaee004
Revert "Update tests"
Naruyoko Jan 23, 2024
51da77a
Revert "Update tests"
Naruyoko Jan 23, 2024
aa89208
Partially revert "Update tests" (suggest.pl)
Naruyoko Jan 23, 2024
07ae844
Preserve the behavior of suggest.pl API (new functionalities in new f…
Naruyoko Jan 24, 2024
9c73b06
Merge remote-tracking branch 'origin/main' into Naruyoko-synonym-auto…
Naruyoko Jan 24, 2024
4b75874
Run Perl linter
Naruyoko Jan 24, 2024
09daf5f
Change the format of matched synonyms to a dictionary
Naruyoko Feb 9, 2024
e86988d
Update docs with the new format
Naruyoko Feb 11, 2024
aa5444c
Add tests for get_taxonomy_suggestions_with_synonyms (by copying)
Naruyoko Feb 17, 2024
12c20c3
Run Perl linter on the new test
Naruyoko Feb 17, 2024
d1742d1
Merge remote-tracking branch 'origin/main' into Naruyoko-synonym-auto…
Naruyoko Feb 17, 2024
f779d0a
docs: minor update on suggest api
alexgarel Feb 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 15 additions & 5 deletions docs/api/ref/api-v3.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ x-stoplight:
info:
title: Open Food Facts Open API V3 - under development
description: |
As a developer, the Open Food Facts API allows you to get information
and contribute to the products database. You can create great apps to
As a developer, the Open Food Facts API allows you to get information
and contribute to the products database. You can create great apps to
help people make better food choices and also provide data to enhance the database.
termsOfService: 'https://openweathermap.org/terms'
contact:
Expand Down Expand Up @@ -58,12 +58,12 @@ paths:
in: query
name: fields
description: |-
Comma separated list of fields requested in the response.
Comma separated list of fields requested in the response.

Special values:
Special values:
* "none": returns no fields
* "raw": returns all fields as stored internally in the database
* "all": returns all fields except generated fields that need to be explicitly requested such as "knowledge_panels".
* "all": returns all fields except generated fields that need to be explicitly requested such as "knowledge_panels".

Defaults to "all" for READ requests. The "all" value can also be combined with fields like "attribute_groups" and "knowledge_panels".'
responses:
Expand Down Expand Up @@ -247,6 +247,11 @@ paths:
description: Array of sorted strings suggestions in the language requested in the "lc" field.
items:
type: string
matched_synonyms:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what I undertand seeing tests, you did not only change api-v3 but also api-v2 (in api.yml). shan't you change it there also ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment might not be relevant if we go for a suggestion api v2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure where I should separate my changes. Do I copy this part with a new name with the new function exposed only there? If so, do I need to copy this function as well? Or do the changes need to be separated at the level of inner functions, like from here?

I am also not sure how I should change api.yml, since it appears mostly empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naruyoko in the code you show, condition should be enough to sort out the cases.

But we could add a parameter to get_taxonomy_suggestions and have a cgi/suggest_v2.pl alike suggest.pl

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naruyoko, do not hesitate to ping me on slack if it's not clear enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using dict, do you mean like this (which was what I intended by "traspose"):

{"errors":[],"suggestions":[{"matched_synonym":"Orge","tag":"Gluten"},{"matched_synonym":"Œufs","tag":"Œufs"},{"matched_synonym":"Arachis hypogaea","tag":"Arachides"},{"matched_synonym":"Langoustine","tag":"Crustacés"},{"matched_synonym":"Fruits à coque dure","tag":"Fruits à coque"},{"matched_synonym":"Beurre concentré","tag":"Lait"},{"matched_synonym":"Pétoncles","tag":"Mollusques"},{"matched_synonym":"Graines de moutarde","tag":"Moutarde"},{"matched_synonym":"Rouget","tag":"Poisson"},{"matched_synonym":"Tofu","tag":"Soja"}],"warnings":[]}

Or like this:

{"errors":[],"suggestions":{"Gluten":"Orge","Œufs":"Œufs","Arachides":"Arachis hypogaea","Crustacés":"Langoustine","Fruits à coque":"Fruits à coque dure","Lait":"Beurre concentré","Mollusques":"Pétoncles","Moutarde":"Graines de moutarde","Poisson":"Rouget","Soja":"Tofu"},"warnings":[]}

Or like this:

{"errors":[],"matched_synonyms":{"Gluten":"Orge","Œufs":"Œufs","Arachides":"Arachis hypogaea","Crustacés":"Langoustine","Fruits à coque":"Fruits à coque dure","Lait":"Beurre concentré","Mollusques":"Pétoncles","Moutarde":"Graines de moutarde","Poisson":"Rouget","Soja":"Tofu"},"status":"success","suggestions":["Gluten","Œufs","Arachides","Crustacés","Fruits à coque","Lait","Mollusques","Moutarde","Poisson","Soja"],"warnings":[]}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your third version, because it keeps things simple as you can just retrieve the simple suggestion list, and it's easy to find back matched synonym.

Just one point: shan't it be "matched_synonyms":{"Gluten":["Orge"],…} as there might be more than one matched synonym (or don't we care because we just return one ? Which is still really ok for me, as it covers the needs).

Also is it useful to put entries like "Œufs": "Œufs" inside matched_synonyms or can the API user deduce from the fact that Œufs is not in the synonyms dict that it means it was directly matched ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with using the third version.

The matched synonym is currently the first matched one of the best match (with "start">"inside">"fuzzy"). I don't see how much useful it is to return all possible matches.

I think it is better to keep "a":"a" entries for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a change with the new format. I will have to update the API docs and create new tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I read your changes, seems perfect 💪

So I wait for the other changes. Don't hesitate to ping me on slack when you are ready @Naruyoko :-)

type: array
description: Array of strings, which are the synonyms the search best matched to, with indices corresponding with the canonical names in the suggestions field.
items:
type: string
operationId: get-api-v3-taxonomy_suggestions-taxonomy
description: |-
Open Food Facts uses multilingual [taxonomies](https://wiki.openfoodfacts.org/Global_taxonomies) to normalize entries for categories, labels, ingredients, packaging shapes / materials / recycling instructions and many more fields.
Expand Down Expand Up @@ -299,6 +304,11 @@ paths:
in: query
name: limit
description: 'Maximum number of suggestions. Default is 25, max is 400.'
- schema:
type: string
in: query
name: get_synonyms
description: 'Whether or not to include "matched_synonyms" in the response. Set to 1 to include.'
- schema:
type: string
in: query
Expand Down
48 changes: 44 additions & 4 deletions html/js/product-multilingual.js
Original file line number Diff line number Diff line change
Expand Up @@ -542,14 +542,22 @@ function initializeTagifyInput(el) {
autocomplete: true,
whitelist: get_recents(el.id) || [],
dropdown: {
enabled: 0
enabled: 0,
maxItems: 100
}
});

let abortController;
let debounceTimer;
const timeoutWait = 300;

function updateSuggestions() {
const value = input.state.inputText;
const lc = (/^\w\w:/).exec(value);
const term = lc ? value.substring(lc[0].length) : value;
input.dropdown.show(term);
}

input.on("input", function (event) {
const value = event.detail.value;
input.whitelist = null; // reset the whitelist
Expand All @@ -565,16 +573,48 @@ function initializeTagifyInput(el) {

abortController = new AbortController();

fetch(el.dataset.autocomplete + "&string=" + value, {
fetch(el.dataset.autocomplete + "&string=" + value + "&get_synonyms=1", {
signal: abortController.signal
}).
then((RES) => RES.json()).
then(function (json) {
input.whitelist = json.suggestions;
input.dropdown.show(value); // render the suggestions dropdown
const lc = (/^\w\w:/).exec(value);
let whitelist;
if (lc) {
whitelist = json.matched_synonyms.map(function (e) {
return {"value": lc + e, "searchBy": e};
});
} else {
whitelist = json.matched_synonyms;
}
const synonymMap = Object.create(null);
for (let i = 0; i < json.suggestions.length; i++) {
synonymMap[json.matched_synonyms[i]] = json.suggestions[i];
}
input.synonymMap = synonymMap;
input.whitelist = whitelist;
updateSuggestions(); // render the suggestions dropdown
});
}, timeoutWait);
}
updateSuggestions();
});

input.on("dropdown:show", function() {
if (!input.synonymMap) {
return;
}
$(input.DOM.dropdown).find("div.tagify__dropdown__item").each(function(_,e) {
let synonymName = e.getAttribute("value");
const lc = (/^\w\w:/).exec(synonymName);
if (lc) {
synonymName = synonymName.substring(3);
}
const canonicalName = input.synonymMap[synonymName];
if (canonicalName && canonicalName !== synonymName) {
e.innerHTML += " (&rarr; <i>" + canonicalName + "</i>)";
}
});
});

input.on("add", function (event) {
Expand Down
12 changes: 9 additions & 3 deletions lib/ProductOpener/APITaxonomySuggestions.pm
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,10 @@ sub taxonomy_suggestions_api ($request_ref) {
};

# Options define how many suggestions should be returned, in which format etc.
my $options_ref = {limit => request_param($request_ref, 'limit')};
my $options_ref = {
limit => request_param($request_ref, 'limit'),
get_synonyms => request_param($request_ref, 'get_synonyms')
};

# Validate input parameters

Expand Down Expand Up @@ -123,8 +126,11 @@ sub taxonomy_suggestions_api ($request_ref) {
# Generate suggestions
else {

$response_ref->{suggestions}
= [get_taxonomy_suggestions($tagtype, $search_lc, $string, $context_ref, $options_ref)];
my @suggestions = get_taxonomy_suggestions($tagtype, $search_lc, $string, $context_ref, $options_ref);
$log->debug("taxonomy_suggestions_api", @suggestions) if $log->is_debug();
$response_ref->{suggestions} = [map {$_->{tag}} @suggestions];
$response_ref->{matched_synonyms} = [map {ucfirst($_->{matched_synonym})} @suggestions]
if $options_ref->{get_synonyms};
}

$log->debug("taxonomy_suggestions_api - stop", {request => $request_ref}) if $log->is_debug();
Expand Down
54 changes: 35 additions & 19 deletions lib/ProductOpener/TaxonomySuggestions.pm
Original file line number Diff line number Diff line change
Expand Up @@ -121,14 +121,16 @@ sub get_taxonomy_suggestions ($tagtype, $search_lc, $string, $context_ref, $opti
) if $log->is_debug();

# Check if we have cached suggestions
my $options_relavant = {%$options_ref};
delete $options_relavant->{get_synonyms};
my $key = generate_cache_key(
"get_taxonomy_suggestions",
{
tagtype => $tagtype,
search_lc => $search_lc,
string => $string,
context_ref => $context_ref,
options_ref => $options_ref
options_ref => $options_relavant
}
);

Expand Down Expand Up @@ -312,23 +314,30 @@ sub match_stringids ($stringid, $fuzzystringid, $synonymid) {

# best_match is used to see how well matches the best matching synonym

sub best_match ($stringid, $fuzzystringid, $synonyms_ids_ref) {
sub best_match ($search_lc, $stringid, $fuzzystringid, $synonyms_ref) {

my $best_match = "none";
my $best_type = "none";
my $best_match = 0;

foreach my $synonymid (@$synonyms_ids_ref) {
foreach my $synonym (@$synonyms_ref) {
my $synonymid = get_string_id_for_lang($search_lc, $synonym);
my $match = match_stringids($stringid, $fuzzystringid, $synonymid);
# Prefer to use the earlier ones from the list for when the canonical name has the same match type as a synonym
next if $match eq "none" or $match eq $best_type;
if ($match eq "start") {
# Best match, we can return without looking at the other synonyms
return "start";
$best_type = $match;
$best_match = $synonym;
last;
}
elsif (($match eq "inside")
or (($match eq "fuzzy") and ($best_match eq "none")))
or (($match eq "fuzzy") and ($best_type eq "none")))
{
$best_match = $match;
$best_type = $match;
$best_match = $synonym;
}
}
return $best_match;
return {type => $best_type, match => $best_match};
}

=head2 filter_suggestions_matching_string ($tags_ref, $tagtype, $search_lc, $string, $options_ref)
Expand Down Expand Up @@ -365,6 +374,8 @@ sub filter_suggestions_matching_string ($tags_ref, $tagtype, $search_lc, $string
my $limit = $options_ref->{limit} || 25;
# Set a hard limit of 400
$limit = min(int($limit), 400);
# Whether or not to get synonyms
my $get_synonyms = $options_ref->{get_synonyms} || 0;

$log->debug(
"filter_suggestions_matching_string",
Expand All @@ -374,7 +385,8 @@ sub filter_suggestions_matching_string ($tags_ref, $tagtype, $search_lc, $string
search_lc => $search_lc,
string => $string,
options_ref => $options_ref,
limit => $limit
limit => $limit,
get_synonyms => $get_synonyms
}
) if $log->is_debug();

Expand Down Expand Up @@ -424,39 +436,43 @@ sub filter_suggestions_matching_string ($tags_ref, $tagtype, $search_lc, $string
my $tag_xx = display_taxonomy_tag("xx", $tagtype, $canon_tagid);

# Build a list of normalized synonyms in the search language and the wildcard xx: language
my @synonyms_ids = map {get_string_id_for_lang($search_lc, $_)} (
my @synonyms = (
@{deep_get(\%synonyms_for, $tagtype, $search_lc, get_string_id_for_lang($search_lc, $tag)) || []},
@{deep_get(\%synonyms_for, $tagtype, "xx", get_string_id_for_lang("xx", $tag_xx)) || []}
);

# check how well the synonyms match the input string
my $best_match = best_match($stringid, $fuzzystringid, \@synonyms_ids);
my $best_match = best_match($search_lc, $stringid, $fuzzystringid, \@synonyms);

$log->debug(
"synonyms_ids for canon_tagid",
"synonyms for canon_tagid",
{
tagtype => $tagtype,
canon_tagid => $canon_tagid,
tag => $tag,
synonym_ids => \@synonyms_ids,
synonyms => \@synonyms,
best_match => $best_match
}
) if $log->is_debug();

my $to_add = {
tag => $tag,
matched_synonym => $best_match->{match}
};
# matching at start, best matches
if ($best_match eq "start") {
push @suggestions, $tag;
if ($best_match->{type} eq "start") {
push @suggestions, $to_add;
# count matches at start so that we can return only if we have enough matches
$suggestions_count++;
last if $suggestions_count >= $limit;
}
# matching inside
elsif ($best_match eq "inside") {
push @suggestions_c, $tag;
elsif ($best_match->{type} eq "inside") {
push @suggestions_c, $to_add;
}
# fuzzy match
elsif ($best_match eq "fuzzy") {
push @suggestions_f, $tag;
elsif ($best_match->{type} eq "fuzzy") {
push @suggestions_f, $to_add;
}
}
}
Expand Down
6 changes: 6 additions & 0 deletions tests/integration/api_v3_taxonomy_suggestions.t
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,12 @@ my $tests_ref = [
path => '/api/v3/taxonomy_suggestions?tagtype=categories&string=Café&lc=fr',
expected_status_code => 200,
},
{
test_case => 'allergens-string-fr-o-get-synonyms',
method => 'GET',
path => '/api/v3/taxonomy_suggestions?tagtype=allergens&string=o&lc=fr&get_synonyms=1',
expected_status_code => 200,
},
# Packaging suggestions return most popular suggestions first
{
test_case => 'packaging-shapes',
Expand Down
Loading
Loading