Smarter punycode warnings #61

Simon-Laux · 2024-02-23T20:46:01Z

Currently all links where the hostname/domain contains puny code triggers the warning/confirmation dialog.

The Problem

While this is good for English region it is bad for other regions that use a different font/script/alphabet.
For them there are many false positives with perfectly valid normal urls.

Non-exhaustive list of Examples:

To a high degree:
- Japanese: kanji, katakana, hiragana,
- Russian: Cyrillic
- Other languages with their own scripts: Hebrew, Chinese, Khmer and so on.
- emoji urls (though only some registrars and top level domains allow them)
To a lesser degree:
- Germans because of the Umlaute, but as they are rarely used it's mostly fine

I don't know how big the problem really is, as internationalised urls are still relatively new and before you could only use ascii, many websites and companies still stick to ascii domains.

Update: https://en.wikipedia.org/wiki/.рф - is used much apparently

Proposed solution

For each language we support specify a list of allowed unicode ranges.

for each detected puny code link check if it fits into the allowed ranges for any language, if no warn the user.

for example:

German would be ascii + umlaute
Japanese would be ascii + kanji + katakana + hiragana
languages with similar signs to ascii would only include the special chars allowed in urls like "-"
- russian: Cyrillic + special chars of ascii

Alternatives Considered

Check for look alike characters and somehow only warn on them
- sounds like a lot of manual work to find those look alike characters first, and even then I don't know how to do it exactly.

Testcases

https://www.münchen.de

To Do: collect more, while checking the meaning, not that we add some problematic domains because we forgot the check

Anyways the first step is to collect test cases.

The text was updated successfully, but these errors were encountered:

farooqkz · 2024-03-27T21:13:55Z

In Iranian society it's not important for hostnames but for the path part.

Simon-Laux mentioned this issue May 10, 2024

Quiet is vulnerable to IDN Homograph Attacks TryQuiet/quiet#1807

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smarter punycode warnings #61

Smarter punycode warnings #61

Simon-Laux commented Feb 23, 2024 •

edited

Loading

farooqkz commented Mar 27, 2024

Smarter punycode warnings #61

Smarter punycode warnings #61

Comments

Simon-Laux commented Feb 23, 2024 • edited Loading

The Problem

Proposed solution

Alternatives Considered

Testcases

farooqkz commented Mar 27, 2024

Simon-Laux commented Feb 23, 2024 •

edited

Loading