Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for recovering parser #192

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

diegodiv
Copy link

@diegodiv diegodiv commented Nov 3, 2023

We're sometimes confronted with managing malformed XML data that we can't fix because we didn't generate it and we don't have the possibilities to fix it upstream: these XML files come from somewhere outside of our control. We should still need to be able to parse them at least partially, if possible. So I added a recovery mode to xml-conduit, similar to what exists in lxml with the option recover. The target of this recovering parser is mainly unescaped characters.

@k0ral
Copy link
Collaborator

k0ral commented Nov 5, 2023

Disclaimer: I'm just a backup maintainer for xml-conduit and the following is just my 2 cents, @snoyberg feel free to overrule 🙂 .

Although this change seems perfectly functional, I do not feel comfortable merging it, and here is why.

Recovering from invalid XML documents in general sounds like a complex feature, one that xml-conduit doesn't claim to provide as far as I know. This is not to say that we shouldn't implement it ever, but if we are to go down the rabbit hole, I believe the topic should be treated as a primary concern in the design of the library, and not as an afterthought. Tackling each edge case in the current state of the library, sounds like a slippery slope to a proliferation of if/else in the codebase that, I believe, will damage its maintainability.

I hope I don't come off as negative/judgmental on the proposed change here, I understand that it may be beneficial to users in the short term, but I think it is detrimental to maintainers (and thus indirectly to the users as well) in the long run.

@diegodiv
Copy link
Author

Thanks for the concise reply.

Recovering from invalid XML documents in general sounds like a complex feature, one that xml-conduit doesn't claim to provide as far as I know.

Albeit less complex, xml-conduit already provides targeted recovery features, in the form of psDecodeIllegalCharacters.

Tackling each edge case in the current state of the library, sounds like a slippery slope to a proliferation of if/else in the codebase that, I believe, will damage its maintainability.

At the moment, xml-conduit doesn't provide an interface that makes recovering from errors possible (in many situations): we patched the library in order to extend its behaviour. Nevertheless, the interface provides enough flexibility to recover in other settings, for example mismatched tags, without resorting to patch the library. If the library were not to implement directly more recovery features than it already does, a nice solution could be to provide such an interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants