Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Synonyms in taxonomized suggestions #9395

Merged
merged 31 commits into from
Feb 19, 2024

Conversation

Naruyoko
Copy link
Contributor

Make sure you've done all the following (You can delete the checklist before submitting)

  • PR title is prefixed by one of the following: feat, fix, docs, style, refactor, test, build, ci, chore, revert, l10n, taxonomy
  • Code is well documented
  • Include unit tests for new functionality
  • Code passes GitHub workflow checks in your branch
  • If you have multiple commits please combine them into one commit by squashing them.
  • Read and understood the contribution guidelines

What

When creating suggestions for taxonomized fields like categories, the API gives back results where one of the synonyms match. However, they are not displayed when they don't include the typed string. This tries to fix that by returning which synonym the input matched to build the suggestions list.

Related issue(s) and discussion

The values, both underlying and displayed, are currently broken
@github-actions github-actions bot added 📚 Documentation Documentation issues improve the project for everyone. JavaScript API v3 multilingual products labels Nov 24, 2023
@Naruyoko Naruyoko changed the title feat; Synonyms in taxonomized suggestions feat: Synonyms in taxonomized suggestions Nov 24, 2023
@Naruyoko
Copy link
Contributor Author

This is still a prototype, and the values shown are still broken. The synonyms should be sent by the display format rather than the ID.

@codecov-commenter
Copy link

codecov-commenter commented Nov 24, 2023

Codecov Report

Attention: 11 lines in your changes are missing coverage. Please review.

Comparison is base (dc04d18) 49.54% compared to head (d1742d1) 49.55%.
Report is 41 commits behind head on main.

Files Patch % Lines
lib/ProductOpener/APITaxonomySuggestions.pm 0.00% 10 Missing ⚠️
lib/ProductOpener/TaxonomySuggestions.pm 96.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #9395    +/-   ##
========================================
  Coverage   49.54%   49.55%            
========================================
  Files          67       67            
  Lines       20650    20769   +119     
  Branches     4980     5001    +21     
========================================
+ Hits        10231    10292    +61     
- Misses       9131     9187    +56     
- Partials     1288     1290     +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Naruyoko
Copy link
Contributor Author

Here is a demo showing the current progress. I haven't yet figured out how to return the synonyms' names properly. That should be done on the server side which has the full taxonomy information.

Synonym.Suggestions.Prototype.Showcase.mp4

Here is the current result of http://fr.openfoodfacts.localhost/api/v3/taxonomy_suggestions?tagtype=allergens&string=o&get_synonyms=1:

{"errors":[],"matched_synonyms":["orge","oeufs","arachis-hypogaea","langoustine","fruits-a-coque-dure","beurre-concentre","petoncles","graines-de-moutarde","rouget","tofu"],"status":"success","suggestions":["Gluten","Œufs","Arachides","Crustacés","Fruits à coque","Lait","Mollusques","Moutarde","Poisson","Soja"],"warnings":[]}

@Naruyoko
Copy link
Contributor Author

Naruyoko commented Dec 3, 2023

Suggestions are now shown in proper names.
Proper names of synonyms

@Naruyoko
Copy link
Contributor Author

Naruyoko commented Dec 4, 2023

The response from http://fr.openfoodfacts.localhost/api/v3/taxonomy_suggestions?tagtype=allergens&string=o&get_synonyms=1 now looks like this:

{"errors":[],"matched_synonyms":["Orge","Œufs","Arachis hypogaea","Langoustine","Fruits à coque dure","Beurre concentré","Pétoncles","Graines de moutarde","Rouget","Tofu"],"status":"success","suggestions":["Gluten","Œufs","Arachides","Crustacés","Fruits à coque","Lait","Mollusques","Moutarde","Poisson","Soja"],"warnings":[]}

Which can be transformed like this:

suggestions matched_synonyms
Gluten Orge
Œufs Œufs
Arachides Arachis hypogaea
Crustacés Langoustine
Fruits à coque Fruits à coque dure
Lait Beurre concentré
Mollusques Pétoncles
Moutarde Graines de moutarde
Poisson Rouget
Soja Tofu

@github-actions github-actions bot added 🧪 tests ✏️ Editing - Auto Suggest Providing autosuggest for taxonomized fields. Mostly used in editing scenarii API Issues related to the Open Food Facts API. More specific labels exist & should be used (API WRITE…) 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies 🧪 additives 🥜 Allergens labels Dec 18, 2023
@Naruyoko Naruyoko force-pushed the Naruyoko-synonym-autocompletion branch from f5d8eb5 to 8047944 Compare December 25, 2023 20:23
Copy link
Member

@alexgarel alexgarel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naruyoko: thanks a lot for this PR.

I have a blocker though. There are other people relying on this suggestion API. So we must not change the response format.

To better play with OpenAPI, I propose that we do a suggestion2.cgi API with your new format (@stephanegigandet do you have a better idea ?)

@@ -247,6 +247,11 @@ paths:
description: Array of sorted strings suggestions in the language requested in the "lc" field.
items:
type: string
matched_synonyms:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what I undertand seeing tests, you did not only change api-v3 but also api-v2 (in api.yml). shan't you change it there also ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment might not be relevant if we go for a suggestion api v2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure where I should separate my changes. Do I copy this part with a new name with the new function exposed only there? If so, do I need to copy this function as well? Or do the changes need to be separated at the level of inner functions, like from here?

I am also not sure how I should change api.yml, since it appears mostly empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naruyoko in the code you show, condition should be enough to sort out the cases.

But we could add a parameter to get_taxonomy_suggestions and have a cgi/suggest_v2.pl alike suggest.pl

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naruyoko, do not hesitate to ping me on slack if it's not clear enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using dict, do you mean like this (which was what I intended by "traspose"):

{"errors":[],"suggestions":[{"matched_synonym":"Orge","tag":"Gluten"},{"matched_synonym":"Œufs","tag":"Œufs"},{"matched_synonym":"Arachis hypogaea","tag":"Arachides"},{"matched_synonym":"Langoustine","tag":"Crustacés"},{"matched_synonym":"Fruits à coque dure","tag":"Fruits à coque"},{"matched_synonym":"Beurre concentré","tag":"Lait"},{"matched_synonym":"Pétoncles","tag":"Mollusques"},{"matched_synonym":"Graines de moutarde","tag":"Moutarde"},{"matched_synonym":"Rouget","tag":"Poisson"},{"matched_synonym":"Tofu","tag":"Soja"}],"warnings":[]}

Or like this:

{"errors":[],"suggestions":{"Gluten":"Orge","Œufs":"Œufs","Arachides":"Arachis hypogaea","Crustacés":"Langoustine","Fruits à coque":"Fruits à coque dure","Lait":"Beurre concentré","Mollusques":"Pétoncles","Moutarde":"Graines de moutarde","Poisson":"Rouget","Soja":"Tofu"},"warnings":[]}

Or like this:

{"errors":[],"matched_synonyms":{"Gluten":"Orge","Œufs":"Œufs","Arachides":"Arachis hypogaea","Crustacés":"Langoustine","Fruits à coque":"Fruits à coque dure","Lait":"Beurre concentré","Mollusques":"Pétoncles","Moutarde":"Graines de moutarde","Poisson":"Rouget","Soja":"Tofu"},"status":"success","suggestions":["Gluten","Œufs","Arachides","Crustacés","Fruits à coque","Lait","Mollusques","Moutarde","Poisson","Soja"],"warnings":[]}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your third version, because it keeps things simple as you can just retrieve the simple suggestion list, and it's easy to find back matched synonym.

Just one point: shan't it be "matched_synonyms":{"Gluten":["Orge"],…} as there might be more than one matched synonym (or don't we care because we just return one ? Which is still really ok for me, as it covers the needs).

Also is it useful to put entries like "Œufs": "Œufs" inside matched_synonyms or can the API user deduce from the fact that Œufs is not in the synonyms dict that it means it was directly matched ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with using the third version.

The matched synonym is currently the first matched one of the best match (with "start">"inside">"fuzzy"). I don't see how much useful it is to return all possible matches.

I think it is better to keep "a":"a" entries for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a change with the new format. I will have to update the API docs and create new tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I read your changes, seems perfect 💪

So I wait for the other changes. Don't hesitate to ping me on slack when you are ready @Naruyoko :-)

@github-actions github-actions bot removed the ✏️ Editing - Auto Suggest Providing autosuggest for taxonomized fields. Mostly used in editing scenarii label Jan 24, 2024
Copy link
Member

@alexgarel alexgarel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good @Naruyoko, thanks for this great contribution !

I add a very small change to documentation.

docs/api/ref/api-v3.yml Outdated Show resolved Hide resolved
@alexgarel alexgarel enabled auto-merge (squash) February 19, 2024 12:34
Copy link

Quality Gate Passed Quality Gate passed

Issues
1 New issue

Measures
0 Security Hotspots
No data about Coverage
0.1% Duplication on New Code

See analysis details on SonarCloud

@alexgarel alexgarel merged commit 908603a into openfoodfacts:main Feb 19, 2024
13 checks passed
john-gom pushed a commit that referenced this pull request May 24, 2024
When creating suggestions for taxonomized fields like categories, the API gives back results where one of the synonyms match. However, they are not displayed when they don't include the typed string. This tries to fix that by returning which synonym the input matched to build the suggestions list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧪 additives API v3 categories config 🧽 Data quality https://wiki.openfoodfacts.org/Quality dependencies Pull requests that update a dependency file 📚 Documentation Documentation issues improve the project for everyone. 🎁 donations export ingredients analysis JavaScript 📖 Knowledge Panels https://wiki.openfoodfacts.org/Knowledge_panels labels multilingual products Product Page 🧪 tests 🧪 unit tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allergens autocompletion should use synonyms
3 participants