Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Don't rely on removeDiacritics to highlight text #4636

Merged
merged 9 commits into from
Dec 16, 2023

Conversation

g123k
Copy link
Collaborator

@g123k g123k commented Sep 15, 2023

Hi everyone,

When we highlight suggestions, we use the removeDiacriticsmethod.
However, it replaces some characters (eg: œ) by 2 or more letters (eg: oe)

I use instead a simpler algorithm to only remove accents.

@g123k g123k added ✏️ Editing Many products are incomplete and don't have Nutri-Score, Eco-Score…so editing is important for users ✏️ Editing - Text fields labels Sep 15, 2023
@g123k g123k requested a review from a team as a code owner September 15, 2023 09:29
@g123k g123k self-assigned this Sep 15, 2023
@g123k g123k linked an issue Sep 15, 2023 that may be closed by this pull request
@codecov-commenter
Copy link

codecov-commenter commented Sep 15, 2023

Codecov Report

Attention: 46 lines in your changes are missing coverage. Please review.

Comparison is base (36fb948) 9.68% compared to head (b963ca2) 9.69%.
Report is 3 commits behind head on develop.

Files Patch % Lines
packages/smooth_app/lib/widgets/smooth_text.dart 12.50% 28 Missing ⚠️
...b/pages/product/nutrition_add_nutrient_button.dart 0.00% 6 Missing ⚠️
...app/lib/generic_lib/widgets/language_selector.dart 0.00% 5 Missing ⚠️
...oth_app/lib/pages/onboarding/country_selector.dart 0.00% 5 Missing ⚠️
...ages/preferences/user_preferences_search_page.dart 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           develop   #4636      +/-   ##
==========================================
+ Coverage     9.68%   9.69%   +0.01%     
==========================================
  Files          318     318              
  Lines        16124   16146      +22     
==========================================
+ Hits          1561    1565       +4     
- Misses       14563   14581      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@monsieurtanuki monsieurtanuki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @g123k!

What happens then with œ vs. oe?

  • it's detected as similar (with removeDiacritics) but not highlighted (with _removeAccents)?
  • it's never considered similar (as not present in _removeAccents)?
  • should we still use removeDiacritics then, and couldn't we get rid of that package?

@g123k
Copy link
Collaborator Author

g123k commented Sep 18, 2023

Hi @g123k!

What happens then with œ vs. oe?

  • it's detected as similar (with removeDiacritics) but not highlighted (with _removeAccents)?
  • it's never considered similar (as not present in _removeAccents)?
  • should we still use removeDiacritics then, and couldn't we get rid of that package?

The initial string with œ will count this as 1 character, vs 2 with removeDiacritics
The goal of _removeAccents is to ignore that kind of case, which will still be the œ character.

removeDiacritics is used elsewhere in the app, so I don't think we can't remove it :(

@monsieurtanuki
Copy link
Contributor

The initial string with œ will count this as 1 character, vs 2 with removeDiacritics The goal of _removeAccents is to ignore that kind of case, which will still be the œ character.

removeDiacritics is used elsewhere in the app, so I don't think we can't remove it :(

My question is: will the search match for œ/oe and it's just the highlighting that will not look very good, OR will the search fail to detect the œ/oe similarity?

@github-actions github-actions bot added 🥫 Product page 🤗 Onboarding We need to onboard users on how the app works, but also on content like Nutri-Score, Eco-Score… 🧪 Tests labels Dec 9, 2023
@g123k
Copy link
Collaborator Author

g123k commented Dec 9, 2023

The algorithm was not fully working, by being tricked if the diacritic was in the text and/or in the filter.
Not it works both ways:

Screen.Recording.2023-12-09.at.02.44.52.mov

I've also added an Extension on String to give a global accent to removeAccent / removeDiacritic

@monsieurtanuki
Copy link
Contributor

@g123k Ready for review?

@g123k
Copy link
Collaborator Author

g123k commented Dec 10, 2023

Yes, that's OK now

Copy link
Contributor

@monsieurtanuki monsieurtanuki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @g123k!

In the PR there are many myString.toLowerCase().removeDiacritics() (and sometimes myString.removeDiacritics().toLowerCase(), which is a bit confusing).
We may be better off with a simpler method like String getComparisonSafeString(String) that would return a combo of removeDiacritics and toLowerCase always in the same order.

But my main question is: does your fix work for the initial issue with œ and oe, when filtering and when displaying? Probably worth a test.

Comment on lines 59 to 63
.removeDiacritics()
.toLowerCase()
.contains(
query!.removeDiacritics().toLowerCase().trim(),
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that be first toLowerCase and the removeDiacritics, like on the rest of the code?

/// An extension on [String]
extension StringExtension on String {
/// Simple algorithm to only remove accents AND not diacritics.
String removeAccents() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That method is never used, right? We'd be better off without it, then.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hesitate on this, as removeDiactrictics requires a lot of memory + CPU.
The idea of this method is to have a quicker way to remove accents, not all, but 90%

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please get rid of package:diacritic/diacritic.dart instead.
We can't have two methods that do the same thing and one being never used ("your" removeAccents).
If you're confident enough in your faster code, let's use it!

@g123k
Copy link
Collaborator Author

g123k commented Dec 11, 2023

Hi @g123k!

In the PR there are many myString.toLowerCase().removeDiacritics() (and sometimes myString.removeDiacritics().toLowerCase(), which is a bit confusing). We may be better off with a simpler method like String getComparisonSafeString(String) that would return a combo of removeDiacritics and toLowerCase always in the same order.

But my main question is: does your fix work for the initial issue with œ and oe, when filtering and when displaying? Probably worth a test.

Ok for the suggestion, but for your question, my video answers it, isn't-it?
Or is there something I'm missing?

Copy link
Contributor

@monsieurtanuki monsieurtanuki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @g123k!
Fair enough, the video is convincing!
Thank you for taking into account my comments.
Still, it's problematic to have two methods doing the same thing, one that you coded, that goes fast but is never called, and one from an external package that is more demanding and that is the only one called.
Please get rid of one of them: both methods look OK, but we can't have both.

.toLowerCase())
_languages
.getNameInEnglish(item)
.getComparisonSafeString()
.contains(query!.toLowerCase()) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.contains(query!.toLowerCase()) ||
.contains(query!) ||

.toLowerCase())
_languages
.getNameInLanguage(item)
.getComparisonSafeString()
.contains(query.toLowerCase()) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.contains(query.toLowerCase()) ||
.contains(query!) ||

/// An extension on [String]
extension StringExtension on String {
/// Simple algorithm to only remove accents AND not diacritics.
String removeAccents() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please get rid of package:diacritic/diacritic.dart instead.
We can't have two methods that do the same thing and one being never used ("your" removeAccents).
If you're confident enough in your faster code, let's use it!

@g123k
Copy link
Collaborator Author

g123k commented Dec 11, 2023

Hi @g123k! Fair enough, the video is convincing! Thank you for taking into account my comments. Still, it's problematic to have two methods doing the same thing, one that you coded, that goes fast but is never called, and one from an external package that is more demanding and that is the only one called. Please get rid of one of them: both methods look OK, but we can't have both.

Actually I use the diacritics package, it's just that I want to hide the import of the lib in all the app.
RemoveAccents is only custom code, but without the case of the œ

@monsieurtanuki
Copy link
Contributor

Actually I use the diacritics package, it's just that I want to hide the import of the lib in all the app.

Understood.

RemoveAccents is only custom code, but without the case of the œ

It is dead code. We don't need dead code, do we?

@g123k
Copy link
Collaborator Author

g123k commented Dec 12, 2023

Ok, I've removed the method + added a test for the new method.
These tests are not very useful (in terms of complexity), but they ensure no regression.

Copy link
Contributor

@monsieurtanuki monsieurtanuki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @g123k for this PR!

@teolemon
Copy link
Member

Thanks @g123k and @monsieurtanuki for the review

@teolemon teolemon merged commit e904b05 into openfoodfacts:develop Dec 16, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✏️ Editing - Text fields ✏️ Editing Many products are incomplete and don't have Nutri-Score, Eco-Score…so editing is important for users 🤗 Onboarding We need to onboard users on how the app works, but also on content like Nutri-Score, Eco-Score… 🥫 Product page 🧪 Tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestions: the app crashes with the "oeuf" keyword
4 participants