Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unresolved char entity causes STDOUT output to stop #9

Open
sgmlguru opened this issue Nov 26, 2022 · 1 comment
Open

Unresolved char entity causes STDOUT output to stop #9

sgmlguru opened this issue Nov 26, 2022 · 1 comment

Comments

@sgmlguru
Copy link

Hi Andrew,

I have an XML file that is not well-formed - it may or may not have a DOCTYPE, but the real problem is that the XML contains a character entity that is not resolved. This currently causes the doctype tool to stop outputting the file contents to STDOUT. In other words, if there is an error, the tool will only pass the content to STDOUT until that point.

The larger issue is that I have a bunch of XML files with character entities but mostly without DOCTYPE declarations that are being linked to a main XML (with DOCTYPE) via XInclude by a system that doesn't appear to think that this is a problem. In order to process the XInclude targets outside this system, I need to first inject them with a DOCTYPE declaration so the character entities can be resolved in a later step.

Many thanks.

@AndrewSales
Copy link
Owner

AndrewSales commented Nov 29, 2022

Hi Ari,
Thanks for reporting this.
I believe the immediate cause is that the wrapper around the parser I am using (expat) understandably abandons the parse after encountering a well-formedness error -- so output ceases as soon as an undeclared entity appears.
I have delved into the lower-level callbacks expat provides as part of investigating some of the other feature requests for this tool, but have not yet discovered a way to preserve the value of the undeclared entity.
It does however provide location information, so a hack might be to simply append the rest of the document as unparsed text from that point. But like I say, a hack...
Andrew

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants