Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 and declaration of the encoding #520

Open
heidivanparys opened this issue Nov 26, 2024 · 1 comment
Open

UTF-8 and declaration of the encoding #520

heidivanparys opened this issue Nov 26, 2024 · 1 comment

Comments

@heidivanparys
Copy link
Member

I recently came across W3C's Encoding Standard. In 4.2. Names and labels, it specifies:

Authors must use the UTF-8 encoding and must use its (ASCII case-insensitive) "utf-8" label to identify it.

New protocols and formats, as well as existing formats deployed in new contexts, must use the UTF-8 encoding exclusively. If these protocols and formats need to expose the encoding’s name or label, they must expose it as "utf-8".

That subclause is referenced from e.g. the HTML specification, see 4.2.5.4 Specifying the document's character encoding:

The Encoding standard requires use of the UTF-8 character encoding and requires use of the "utf-8" encoding label to identify it. Those requirements necessitate that the document's character encoding declaration, if it exists, specifies an encoding label using an ASCII case-insensitive match for "utf-8". Regardless of whether a character encoding declaration is present or not, the actual character encoding used to encode the document must be UTF-8. [ENCODING]

So the requirement from the Encoding Standard actually overrules the recommendation from the XML standard, 4.3.3 Character Encoding in Entities, which specifies that:

In an encoding declaration, the values " UTF-8 ", " UTF-16 ", " ISO-10646-UCS-2 ", and " ISO-10646-UCS-4 " SHOULD be used for the various encodings and transformations of Unicode / ISO/IEC 10646, [...]

How does this impact TC 211's standards and resources? I guess it mainly would impact the XMG resources (encoding declaration has to be <?xml version="1.0" encoding="utf-8"?> instead of <?xml version="1.0" encoding="UTF-8"?>). The standards impacted probably mainly originate from OGC.

@PeterParslow
Copy link
Contributor

Heidi,
given that both the W3C sources you cite are explicit that it is a case-insensitive label I see no reason to change from UTF-8 to utf-8 or vice versa.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants