UTF-8 and declaration of the encoding #520

heidivanparys · 2024-11-26T15:57:50Z

I recently came across W3C's Encoding Standard. In 4.2. Names and labels, it specifies:

Authors must use the UTF-8 encoding and must use its (ASCII case-insensitive) "utf-8" label to identify it.

New protocols and formats, as well as existing formats deployed in new contexts, must use the UTF-8 encoding exclusively. If these protocols and formats need to expose the encoding’s name or label, they must expose it as "utf-8".

That subclause is referenced from e.g. the HTML specification, see 4.2.5.4 Specifying the document's character encoding:

The Encoding standard requires use of the UTF-8 character encoding and requires use of the "utf-8" encoding label to identify it. Those requirements necessitate that the document's character encoding declaration, if it exists, specifies an encoding label using an ASCII case-insensitive match for "utf-8". Regardless of whether a character encoding declaration is present or not, the actual character encoding used to encode the document must be UTF-8. [ENCODING]

So the requirement from the Encoding Standard actually overrules the recommendation from the XML standard, 4.3.3 Character Encoding in Entities, which specifies that:

In an encoding declaration, the values " UTF-8 ", " UTF-16 ", " ISO-10646-UCS-2 ", and " ISO-10646-UCS-4 " SHOULD be used for the various encodings and transformations of Unicode / ISO/IEC 10646, [...]

How does this impact TC 211's standards and resources? I guess it mainly would impact the XMG resources (encoding declaration has to be <?xml version="1.0" encoding="utf-8"?> instead of <?xml version="1.0" encoding="UTF-8"?>). The standards impacted probably mainly originate from OGC.

The text was updated successfully, but these errors were encountered:

PeterParslow · 2024-11-29T08:54:15Z

Heidi,
given that both the W3C sources you cite are explicit that it is a case-insensitive label I see no reason to change from UTF-8 to utf-8 or vice versa.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8 and declaration of the encoding #520

UTF-8 and declaration of the encoding #520

heidivanparys commented Nov 26, 2024

PeterParslow commented Nov 29, 2024

UTF-8 and declaration of the encoding #520

UTF-8 and declaration of the encoding #520

Comments

heidivanparys commented Nov 26, 2024

PeterParslow commented Nov 29, 2024