You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 31, 2023. It is now read-only.
A little background: the full spectrum of traditional HTML entities (including e.g. é) is available to an HTML parser for non-XML HTML document. But if I create an XHTML document (which is essentially a normal HTML document but adhering to the stricter rules of XML), only the XML entities above are recognized, unless the XHTML document actually defines those entities (or pulls in a DTD that defines them).
So if I have the following test:
<p>touché</p>
escape-utils will HTML-encode that to:
<p>touché</p>
That works for a plain HTML document, but will break an XML parser, as é would be undefined. An "XML Encode" option would give me this:
<p>touché</p>
Even in plain HTML, I may not want the é encoded---after all UTF-8 can handle accents just fine now. All those HTML entities for Latin characters were added when we were using plain ASCII to create HTML files, and editors didn't support UTF-8 and Unicode.
So please add an "XML Encode" option. It would work exactly the same as the "HTML Encode" option, except the list of entities would be restricted to those predefined XML entities listed at the beginning of this issue. (Obviously we would need a corresponding "XML Encode Maintain Lines" option as well.) Thanks.
The text was updated successfully, but these errors were encountered:
escape-utils needs an option to only escape those characters that are recognized by an XML parser; specifically:
<
>
&
'
"
https://www.w3.org/TR/xml/#sec-predefined-ent
A little background: the full spectrum of traditional HTML entities (including e.g.
é
) is available to an HTML parser for non-XML HTML document. But if I create an XHTML document (which is essentially a normal HTML document but adhering to the stricter rules of XML), only the XML entities above are recognized, unless the XHTML document actually defines those entities (or pulls in a DTD that defines them).So if I have the following test:
escape-utils will HTML-encode that to:
That works for a plain HTML document, but will break an XML parser, as
é
would be undefined. An "XML Encode" option would give me this:Even in plain HTML, I may not want the
é
encoded---after all UTF-8 can handle accents just fine now. All those HTML entities for Latin characters were added when we were using plain ASCII to create HTML files, and editors didn't support UTF-8 and Unicode.So please add an "XML Encode" option. It would work exactly the same as the "HTML Encode" option, except the list of entities would be restricted to those predefined XML entities listed at the beginning of this issue. (Obviously we would need a corresponding "XML Encode Maintain Lines" option as well.) Thanks.
The text was updated successfully, but these errors were encountered: