You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When parsing an XML feed like the below, with an unbalanced close tag, an error gets thrown at the following line which can be caught, and parsing is able to continue:
If, however, an XML feed with an unbalanced close tag that has no preceding white space is encountered, a second exception is thrown which cannot be recovered from.
What's happening is that if such an unbalanced close tag error is caught, and then reader.next() is called subsequently, the stream reader's current event type continues to be set to whatever the previous event type was. If an unbalanced close tag was preceded by some whitespace characters, the current event type is set to characters i.e. mCurrToken == XMLEvent.CHARACTERS. Calling reader.next() reads the unbalanced close tag as if they were characters with no problem, and then parsing can continue from there.
If the unbalanced close tag was preceded by another close tag, it treats the unbalanced close tag as a close tag and ends up hitting the below line:
if (!mElementStack.pop()) { // false if root closed
This throws an IllegalStateException from the line below. If you try to call reader.next() after catching this error, it continues trying to parse the unbalanced close tag as a close tag, and continuously throws this error.
thrownewIllegalStateException("Popping from empty stack");
Our use case is unusual, as we are trying to parse XML documents in fragments and so encounter this case quite regularly. Right now, our workaround is to extend from the Woodstox stream reader and manually set the event type to characters if we encounter this error. That doesn't sound like a solution that would work for all use cases, but it works for us. However, shouldn't it be possible to skip over a close tag if no corresponding opening tag has been seen by the parser, with the default Woodstox parser? If anyone has an idea on how to solve this, I'd be happy to contribute a PR.
try {
returnnext();
} catch (IllegalStateExceptionex) {
if (!ex.getMessage().equals("Popping from empty stack")) {
throwex;
}
mCurrToken = XMLEvent.CHARACTERS;
returnmCurrToken;
}
The text was updated successfully, but these errors were encountered:
One quick comment just to make sure I understand this correctly: XML parsers are required to verify well-formedness, so quietly skipping "unmatched" close tags is a no-no and should not be the default behavior. But it is fine to have ways to configure things to allow that for use cases where it is desired (either via new property or, probably better, through error handler).
So... It would be nice of course to make it easier to recover from issues like this, although in general it is not something that can be solved for every case (and cost of trying to keep state recoverable in all cases is pretty steep)
But in this particular case perhaps there could be something as simple as reordering state update or something -- and if so, I would accept a patch. And I think that first figuring out where handling would work makes sense, after which specific configuration mechanism can be decided on.
I don't have time or interest to work on this issue, but if someone wants to try come up with a change to allow better handling of non-well-formed content, would be happy to help get such feature merged.
When parsing an XML feed like the below, with an unbalanced close tag, an error gets thrown at the following line which can be caught, and parsing is able to continue:
woodstox/src/main/java/com/ctc/wstx/sr/BasicStreamReader.java
Line 3337 in 489149b
If, however, an XML feed with an unbalanced close tag that has no preceding white space is encountered, a second exception is thrown which cannot be recovered from.
What's happening is that if such an unbalanced close tag error is caught, and then
reader.next()
is called subsequently, the stream reader's current event type continues to be set to whatever the previous event type was. If an unbalanced close tag was preceded by some whitespace characters, the current event type is set to characters i.e.mCurrToken == XMLEvent.CHARACTERS
. Callingreader.next()
reads the unbalanced close tag as if they were characters with no problem, and then parsing can continue from there.If the unbalanced close tag was preceded by another close tag, it treats the unbalanced close tag as a close tag and ends up hitting the below line:
woodstox/src/main/java/com/ctc/wstx/sr/BasicStreamReader.java
Line 2737 in 489149b
This throws an IllegalStateException from the line below. If you try to call
reader.next()
after catching this error, it continues trying to parse the unbalanced close tag as a close tag, and continuously throws this error.woodstox/src/main/java/com/ctc/wstx/sr/InputElementStack.java
Line 372 in 489149b
Our use case is unusual, as we are trying to parse XML documents in fragments and so encounter this case quite regularly. Right now, our workaround is to extend from the Woodstox stream reader and manually set the event type to characters if we encounter this error. That doesn't sound like a solution that would work for all use cases, but it works for us. However, shouldn't it be possible to skip over a close tag if no corresponding opening tag has been seen by the parser, with the default Woodstox parser? If anyone has an idea on how to solve this, I'd be happy to contribute a PR.
The text was updated successfully, but these errors were encountered: