Skip to content

IDNA proposal #2874

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

IDNA proposal #2874

wants to merge 7 commits into from

Conversation

tlimoncelli
Copy link
Contributor

No description provided.

@tlimoncelli
Copy link
Contributor Author

Mentioning a number of people that I suspect have IDNA domains or knowledge about how IDNA works (based on skimmy bugs and PRs)

@adamus1red
@dkim1970
@flz
@juliusrickert
@killerbees19
@Kusado
@louis-lau
@masterzen
@mderriey
@pmoroney
@tresni
@Yannik

I'd love to receive feedback about this proposal!

@Yannik
Copy link
Contributor

Yannik commented Mar 21, 2024

@tlimoncelli

I think your example for the asci==unicode exampe might be wrong:

#7: + CREATE foo.example.com MX 10 xn--p1ai.com (рф.com) (ttl=14400)

Was this what you intended?

@Yannik
Copy link
Contributor

Yannik commented Mar 21, 2024

I noticed you created both CREATE and MODIFY examples for ascii (unicode) and unicode (ascii), but only MODIFY examples for ascii and unicode. How is that to be understood?

@tlimoncelli
Copy link
Contributor Author

@tlimoncelli

I think your example for the asci==unicode exampe might be wrong:

#7: + CREATE foo.example.com MX 10 xn--p1ai.com (рф.com) (ttl=14400)

Was this what you intended?

Ah, good point!

The first one is where the label is ascii==unicode but the target is ascii!=unicode. I'll update the comment.

Thanks for finding that!

Tom

@tlimoncelli
Copy link
Contributor Author

I noticed you created both CREATE and MODIFY examples for ascii (unicode) and unicode (ascii), but only MODIFY examples for ascii and unicode. How is that to be understood?

I've added more examples. I don't think I've covered every combination, but my goal is to show typical examples not every possible example.

I've also added examples where we use {} and ⟬⟭ and ❮❯. I think using unicode chars to highlight unicode domains would be cool (maybe too clever?).

@adamus1red
Copy link

adamus1red commented Mar 21, 2024

While I'm not against having the ASCII and UTF on the output lines, I do worry it might make the output too busy.
Wouldn't simply using the .Name value be a better since it should then pretty much match what is in the dnscontrol configuration?

@pmoroney
Copy link
Contributor

LGTM

@tlimoncelli
Copy link
Contributor Author

@adamus1red wrote:

While I'm not against having the ASCII and UTF on the output lines, I do worry it might make the output too busy.
Wouldn't simply using the .Name value be a better since it should then match what is in the dnscontrol configuration?

That's an interesting point! I guess my thought is that showing both versions helps with debugging.

@adamus1red
Copy link

adamus1red commented Mar 21, 2024

That's an interesting point! I guess my thought is that showing both versions helps with debugging.

@tlimoncelli maybe a compromise would be if the output was the same as what the DNS provider or Registrar used.

I know I've had issues where the DNS is using UTF but the registrar is using ASCII. I.e. namecheap uses ascii, so for registrar stuff using namecheap use ascii punycode and the DNS is cloudflare which uses UTF, so the output uses UTF.

@louis-lau
Copy link

The only IDNA domain I have is for fun, so I don't have a strong preference. I'll give my input nonetheless :). If you want to show both, I think I like B better, as it feels more consistent to me. Anything not in brackets will always be ASCII that way.

I'd probably go with showing what the original user input was, with a flag to only show ASCII if needed. It's less information to parse, and the user should be familiar with it as that's the way it's listed in their config. I could see points being made for showing both, but I've always liked things more distraction free and less dense.

I think the Unicode brackets are a little too clever, perhaps even a little confusing ;)

@Yannik
Copy link
Contributor

Yannik commented Mar 23, 2024

First of all, improving IDNA handling would be a great improvement to dnscontrol.

Regarding output, the one thing I definitely do not like is having the ascii output come first, because it is the one least likely to be understood/mentally associated with the relevant domain.

I think simply using the original user input has merit, pairing that with a toggle to additionally show ascii seems fine to me.
However, I also wouldn't mind the unicode (ascii) output.

@dkim1970
Copy link
Contributor

First of all, improving IDNA handling would be a great improvement to dnscontrol.

Regarding output, the one thing I definitely do not like is having the ascii output come first, because it is the one least likely to be understood/mentally associated with the relevant domain.

I think simply using the original user input has merit, pairing that with a toggle to additionally show ascii seems fine to me. However, I also wouldn't mind the unicode (ascii) output.

I'm seconding this suggestion, by displaying the "human readable" format I think the barrier for using IDN's with dnscontrol is getting lowered.

Because the IDNA format is not human readable, especially when it comes to non-latinized languages.

@tlimoncelli
Copy link
Contributor Author

This is excellent feedback! It's getting me excited!

Question: In what situations would people want to see something besides the .Name (the user input) version?

@adamus1red
Copy link

Question: In what situations would people want to see something besides the .Name (the user input) version?

What about if the registrar or dns provider use something different than the .Name value, then include the version they are using in brackets?

@louis-lau
Copy link

Personally, I think whatever the dns provider does isn't relevant to the cli output. Behind the scenes at every provider, it's all punycode anyway.

@Yannik
Copy link
Contributor

Yannik commented Mar 23, 2024

Personally, I think whatever the dns provider does isn't relevant to the cli output. Behind the scenes at every provider, it's all punycode anyway.

Agreed, having the display format handled outside of the provider is to be preferred IMO.

@masterzen
Copy link
Collaborator

I don't have experience with IDNA at all.
My $0.02: I do agree that showing the ascii version is useful only for debugging, what users want to see is if their unicode domain (or however it was entered in dnsconfig.js) is being processed and how.

@tlimoncelli
Copy link
Contributor Author

Hi folks!

2 ideas:

Support multiple formats?

There's been a lot of discussion about ascii (unicode) vs unicode (ascii). It might be possible to add a command line flag that selected the format. No promises, but it might be possible. In that case, I'd recommend the default be userinput and add a flag for debugging that shows unicode (ascii) or userinput (ascii).

I'll know more if this is possible when I start coding.

An idea that would break less existing code

Existing code expects .Name to be ASCII (the current code runs dc.Punycode() for all providers, which rewrites .Name to be ASCII). Rather than require every use of .Name to change to .NameASCII, maybe the names should be: .Name (ASCII, to be compatible with old code), .NameORIG (how the user input the string), .NameUNICODE, and .NameDisplay.

```
models.DomainConfig:

.Name: the name from D() after downcased via unicode.ToLower()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to always put the punycode variant in here (which is identical to the non-punycode if it's a ASCII-only domain).
This should require the least amount of changes in all providers and "do the right thing" out of the box 99% of the time, without breaking pure-ASCII domains.

It's also least surprising for users - how they write it shouldn't affect how the provider should treat it.
Which encoding the user provided in the D call is nothing individual providers need to worry about.

models.DomainConfig:

.Name: the name from D() after downcased via unicode.ToLower()
.NameASCII: The name stored after calling ToASCII() (with ACE prefix if any Unicode chars are present)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop


.Name: the name from D() after downcased via unicode.ToLower()
.NameASCII: The name stored after calling ToASCII() (with ACE prefix if any Unicode chars are present)
.NameUnicode: The name stored after calling ToUnicode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDN seems more fitting (Internationalized Domain Name)

.Name: the name from D() after downcased via unicode.ToLower()
.NameASCII: The name stored after calling ToASCII() (with ACE prefix if any Unicode chars are present)
.NameUnicode: The name stored after calling ToUnicode()
.NameDisplay: if .NameASCII != .NameUnicode, store as "ascii (unicode)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing. Shouldn't this be the bigger slice that Name and IDN would be subslices of?


Here are some example outputs:

NOTE: Feedback needed! Do you prefer "a" or "b"? Is there an even better format I should consider? Should we use `{}` instead of `()`?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer b and (). We should also specify NameDisplay only is a string with both variants if the domain is non-ASCII.


Here are some example outputs:

NOTE: Feedback needed! Do you prefer "a" or "b"? Is there an even better format I should consider? Should we use `{}` instead of `()`?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

band ()

models.DomainConfig:

.Name: the name from D() after downcased via unicode.ToLower()
.NameASCII: The name stored after calling ToASCII() (with ACE prefix if any Unicode chars are present)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, drop!

```
models.DomainConfig:

.Name: the name from D() after downcased via unicode.ToLower()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

punycode variant in here ftw.

.Name: the name from D() after downcased via unicode.ToLower()
.NameASCII: The name stored after calling ToASCII() (with ACE prefix if any Unicode chars are present)
.NameUnicode: The name stored after calling ToUnicode()
.NameDisplay: if .NameASCII != .NameUnicode, store as "ascii (unicode)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it will be either

  • mydomain.com if it is not about an IDN and
  • xn--p1ai.com (рф.com) otherwise.

Sounds good to me.


.Name: the name from D() after downcased via unicode.ToLower()
.NameASCII: The name stored after calling ToASCII() (with ACE prefix if any Unicode chars are present)
.NameUnicode: The name stored after calling ToUnicode()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I enjoy playing with short names that make things clear - so IDN would be perfect, but from developer perspective, having that property to start also with NameXXX makes it probably better accessible via IDE as the Name properties appear right after each other.

  • .NameIDN isn't applicable as it states "name" twice out
  • .NameUnicode is ok though
  • .NameInternationalized eventually as well

My favs are by that .IDN or .NameUnicode while .NameUnicode is still my personal pick. Unicode and ASCII keywords are often used whenever it comes to IDN translation libraries. While IDN and Punycode keywords are usually used on Domain/DNS Provider side. That's at least what I noticed over the years. Still, that shouldn't give a hint for making a decision.

@KaiSchwarz-cnic
Copy link
Collaborator

KaiSchwarz-cnic commented May 12, 2025

Also, IDN isn't IDN if we compare .de and .com. Some TLD Providers support different IDNA Standards (IDNA2003 vs. IDNA2008, UTS46). Translating an IDN might by that end in a different punycode variant.

Let me provide some example in here from the HEXONET Provider's ConvertIDN API Command:

[COMMAND]
COMMAND = ConvertIDN
DOMAIN0 = ärzte.com
DOMAIN1 = ärzte.de
EOF

[RESPONSE]
CODE = 200
DESCRIPTION = Command completed successfully
PROPERTY[ACE][0] = xn--rzte-koa.com
PROPERTY[ACE][1] = xn--rzte-koa.de
PROPERTY[IDN][0] = ärzte.com
PROPERTY[IDN][1] = ärzte.de
EOF

No big difference in here. But let us pick one with german special characters:

[COMMAND]
COMMAND = ConvertIDN
DOMAIN0 = fußball.com
DOMAIN1 = fußball.de
EOF

[RESPONSE]
CODE = 200
DESCRIPTION = Command completed successfully
PROPERTY[ACE][0] = fussball.com
PROPERTY[ACE][1] = xn--fuball-cta.de
PROPERTY[IDN][0] = fussball.com
PROPERTY[IDN][1] = fußball.de
EOF

Let us ignore that .com is covering that differently and let us use the punycode variant returned for .de as a .com domain name xn--fuball-cta.com. While the IDN translation is from technical perspective correct, it won't work together with the TLD Provider as of a different supported IDNA Standard. By that, I highly think that this needs to be considered as well when going for a IDNA proposal. The above API Command is mapping the response to a working variant.

Mhmm... DNSControl again runs on "existing" data configured by the user. By that, the input should be considered as "correct" (would be very stupid otherwise) and by that, we can consider this special discussion probably as superfluous...
Or should DNSControl then exit with an error in case a potential IDN Precheck fails?

Mhmm 2 ... The DNS/Domain Provider should finally be capable of handling that on their own (returning an error message static out that the provided domain/dnszone name is invalid) and you guys do not have to worry about all that. Sorry that I bumped this up :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants