Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added psShouldSkipAttributeReferenceParsing to ParseSettings #178

Conversation

jaunruh
Copy link

@jaunruh jaunruh commented Jul 3, 2022

Referencing: #177
Instead of replacing the removed characters in the render method using the RenderSettings I adapted the ParseSettings to allow skipping of reference rendering in attributes.

@k0ral
Copy link
Collaborator

k0ral commented Jul 3, 2022

Is there a reason why you chose to work around #177 at parser-level, rather than at renderer-level as suggested in that issue ?

Indeed, my understanding is that the present change violates the standard (XML references aren't resolved in attribute values), and I'm under the impression that a workaround at renderer-level would be equally simple yet still standard compliant.

Besides, it seems to me that this change, as is, will parse newline references into ContentText "
", which then will get rendered into 
, which isn't even what you're looking for, or am I mistaken ?

@jaunruh
Copy link
Author

jaunruh commented Jul 4, 2022

Honestly didn't think about that. But indeed, you are correct:

Text.XML.renderLBS def $ Text.XML.parseLBS_ customParse $ Data.ByteString.Lazy.Char8.pack "<someElement id=\"1\" name=\"Picture 1\" descr=\"A picture containing window, indoor&#xA;&#xA;Description automatically generated\"/>"
"<?xml version=\"1.0\" encoding=\"UTF-8\"?><someElement descr=\"A picture containing window, indoor&amp;#xA;&amp;#xA;Description automatically generated\" id=\"1\" name=\"Picture 1\"/>"

My reasoning was that changing the parsing-level instead render-level would make sure all references are retained in the standard case. With rsEncodeCharacters :: Set Char in RenderSettings like you suggested in #177 one would always have to supply the Set Char in order to receive the original input. So basically "why break something to repair it again".

Could you maybe refer me to the XML reference that you are referring to, please?

So I guess I am going to implement the render-level suggestion from #177. You can either close this pull request and let me open a new one or we rename this one.

@k0ral
Copy link
Collaborator

k0ral commented Jul 4, 2022

Could you maybe refer me to the XML reference that you are referring to, please?

The relevant parts of the standard I was referring to are:

My understanding is that this essentially means &#xA must be interpreted as a character reference for the newline character.

You can either close this pull request and let me open a new one or we rename this one.

I'd rather close this pull-request, you're welcome to open a new one for an implementation in the rendering logic :) .

@k0ral k0ral closed this Jul 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants