Need an "XML Encode" option. #10

garretwilson · 2016-08-10T18:20:37Z

escape-utils needs an option to only escape those characters that are recognized by an XML parser; specifically:

<
>
&
'
"

https://www.w3.org/TR/xml/#sec-predefined-ent

A little background: the full spectrum of traditional HTML entities (including e.g. é) is available to an HTML parser for non-XML HTML document. But if I create an XHTML document (which is essentially a normal HTML document but adhering to the stricter rules of XML), only the XML entities above are recognized, unless the XHTML document actually defines those entities (or pulls in a DTD that defines them).

So if I have the following test:

<p>touché</p>

escape-utils will HTML-encode that to:

&lt;p&gt;touch&eacute;&lt;/p&gt;

That works for a plain HTML document, but will break an XML parser, as é would be undefined. An "XML Encode" option would give me this:

&lt;p&gt;touché&lt;/p&gt;

Even in plain HTML, I may not want the é encoded---after all UTF-8 can handle accents just fine now. All those HTML entities for Latin characters were added when we were using plain ASCII to create HTML files, and editors didn't support UTF-8 and Unicode.

So please add an "XML Encode" option. It would work exactly the same as the "HTML Encode" option, except the list of entities would be restricted to those predefined XML entities listed at the beginning of this issue. (Obviously we would need a corresponding "XML Encode Maintain Lines" option as well.) Thanks.

The text was updated successfully, but these errors were encountered:

PiotrCzapla · 2016-12-14T15:16:03Z

Hi make sense, I don't think I will find time to add this anytime soon but feel free to provide a patch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need an "XML Encode" option. #10

Need an "XML Encode" option. #10

garretwilson commented Aug 10, 2016

PiotrCzapla commented Dec 14, 2016

Need an "XML Encode" option. #10

Need an "XML Encode" option. #10

Comments

garretwilson commented Aug 10, 2016

PiotrCzapla commented Dec 14, 2016