Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML and XMLName should account for U+061C (ALM) #1903

Open
aphillips opened this issue Sep 12, 2024 · 0 comments
Open

XML and XMLName should account for U+061C (ALM) #1903

aphillips opened this issue Sep 12, 2024 · 0 comments
Labels
pending Issue not yet sent to WG, or raised by tracker tool & needing labels. t:char_string 4.9 Defining 'string'

Comments

@aphillips
Copy link
Contributor

Proposed comment

https://www.w3.org/TR/REC-xml-names/#ns-qualnames
https://www.w3.org/TR/xml/#sec-common-syn

[4] | NameStartChar | ::= | ":"  \| [A-Z] \| "_" \| [a-z] \| [#xC0-#xD6] \| [#xD8-#xF6] \| [#xF8-#x2FF] \|  [#x370-#x37D] \| [#x37F-#x1FFF] \| [#x200C-#x200D] \| [#x2070-#x218F] \|  [#x2C00-#x2FEF] \| [#x3001-#xD7FF] \| [#xF900-#xFDCF] \| [#xFDF0-#xFFFD] \|  [#x10000-#xEFFFF]
[4a] | NameChar | ::= | NameStartChar \| "-" \| "." \| [0-9] \| #xB7 \| [#x0300-#x036F] \| [#x203F-#x2040]
[5] | Name | ::= | NameStartChar (NameChar)*
[6] | Names | ::= | Name (#x20 Name)*
[7] | Nmtoken | ::= | (NameChar)+
[8] | Nmtokens | ::= | Nmtoken (#x20 Nmtoken)*

XML 1.0 5e and XML Names 1.0 use the construct NameStartChar shown above. The characters in names defined using NameStartChar were deliberately limited to avoid known problem characters at the time of adoption. Notable among the characters excluded from names are invisible formatting controls.

The character U+061C ARABIC LETTER MARK was added to Unicode in version 6.3 (in 2013). This character is similar to U+200F RIGHT-TO-LEFT MARK, which is not a NameStartChar. It is unusual that an invisible, non-spacing mark like this be added to Unicode. An XML name that consists of this single, invisible formatting control is thus valid, but it seems like a bug, not a feature.

(This issue was encountered in creating the MessageFormat 2.0 standard at Unicode, where we are attempting to use NCName and Name to define valid identifiers).

The downside, of course, is that very many implementations will not be aware if a change were made to NameStartChar.


This is a tracker issue. Only discuss things here if they are i18n WG internal meta-discussions about the issue. Contribute to the actual discussion at the following link:

§ url_for_the_issue_raised

@aphillips aphillips added pending Issue not yet sent to WG, or raised by tracker tool & needing labels. t:char_string 4.9 Defining 'string' labels Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending Issue not yet sent to WG, or raised by tracker tool & needing labels. t:char_string 4.9 Defining 'string'
Projects
None yet
Development

No branches or pull requests

1 participant