Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching of non-accented letters to accented letters #17

Open
jonathanheron opened this issue Nov 21, 2011 · 5 comments
Open

Matching of non-accented letters to accented letters #17

jonathanheron opened this issue Nov 21, 2011 · 5 comments

Comments

@jonathanheron
Copy link

Some countries' alternative spellings (Éire, Österreich etc) include accented characters that are not matched by their non-accented equivalents.

For example, entering 'eire' for Ireland does not result in a match for Ireland.

Rather than amending the data-alternative-spellings to account for accent variations, would it be possible to having broad string matching that works with-without accents?

@jamieholst
Copy link
Owner

Theoretically, yes. For performance reasons this would probably have to be done on initialization. Also, we'd also need a good list of mappings..

Have you seen a similar implementation in another front-end framework – maybe there's some experience and approaches we can get some inspiration from.

@timcooper
Copy link
Contributor

I've implemented this, the only difference being my options are French wines instead of countries, so I only cover French diacritics.

Basically, I run the original label/alternative_spellings through my convert function and then add it to the 'matches' string on initialization. Not sure if I do the converting in the best way but it's just a bunch of chained .replace()s

I can pull request if you would like, but I imagine you would want to add an option for it and come up with a better convert function that covers a broader range of accents.

The way it works is if you type eg. "Le" it will match all the "le"s and "lé"s, however if you type "Lé" it will only match the "lé"s.

@jamieholst
Copy link
Owner

I'd like this feature to be some sort of a map so the user can control the behavior. So the user could pass in an 'accented-letters' option which would be an array of character mappings. In this array there could then be either arrays of hashes. Arrays would be a bi-directional mapping, so any character in the sub-array would map to any of the other characters in that sub-array. Hashes would be a one-directional mapping, so the key would match all the values but the values would not match the key. Example:

'accented-letters': [
  ['ss', 'ß'],
  { 'e': ['é', 'è'] }
]

In this scenario, ss and ß would be interchangeable – it doesn't matter what you use when searching. Furthermore, when searching for Le you would get results for Le, and , whereas if you searched for you would only get results for (that is, neither Le nor would match because it is a one-directional mapping).

Finally, if the 'accented-letters' has a value of false then the feature should be completely disabled for performance gains.

@Sjord
Copy link

Sjord commented Feb 21, 2015

The correct way to do this in javascript is to use the Intl.Collator with a sensitivity of base. This provides a compare function to compare two strings, where any diacritics are ignored in the comparison.

E.g.
Intl.Collator('de', {'sensitivity': 'base'}).compare('Österreich', 'Osterreich') == 0

Canisue.com has info on browser support for this.

The compare function can only compare two whole strings, and selectToAutocomplete we want to match parts of country names, so it is not so easy to replace the current regexes with this function. However, I think letting a user specify a locale and sensitivity is a better way than using a hardcoded character map to convert accented characters into ascii.

@Sjord
Copy link

Sjord commented Feb 21, 2015

I made a proof of concept with PR #84.

brandoncarl added a commit to brandoncarl/country-selector that referenced this issue Apr 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants