-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEXT-215: Prevent decimal numeric entities from wrongly including hexadecimal characters #310
base: master
Are you sure you want to change the base?
Conversation
…adecimal characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rbunel35 ! Thank you for your contribution. Could you include a unit test as well, please?
Hi @kinow ! |
Great! I've allowed the workflow to run. Now just need to find time to review issue & code (though that's rather a small change). Thanks for the pull request @rbunel35 ! |
Thank you very much ! |
Hello ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay @rbunel35 . Had some spare time now, so re-read the JIRA issue with calm, and then looked at the code & test. Everything looks good IMO. Not sure whether we will need a CVE for this or not. For future issues related to security, it is better to use security instructions from the Commons site 👍
Will leave it open to see if anyone else wants to take a look. If it takes too long to get any activity, feel free to bump me and I'll check in the mailing list/slack/etc to see if anyone wants to review and will check on the CVE.
Thanks
Bruno
https://www.w3.org/TR/REC-xml/#dt-charref Why are illegal entities allowed in the first place? Am I reading the specification incorrectly? The ';' character should be required. IMO this feature creep on our end feels improper and should not be allowed or at the very least deprecated. |
Good point. I haven't checked any specification yet, but this:
Or this:
Both trigger an alert (tested with |
My personal opinion is that we should stick to a specific version of a specification, in this case W3C XML. If we also want to emulate what a browser does or what another language does, then that could be the job for a subclass. So maybe we should be refactoring the code? Needs other opinions... |
Hi, Indeed I understand that semicolon-less character entities do not form part of the specification, however as pointed by @kinow, virtually all modern browsers read them as valid entities anyway, and as a user of Commons Text, I expected it to work the same way. Also, please note that the acceptance of numeric character entities without semi-colon precedes my contribution (via the "semiColonOptional" value in the enum). My fix only makes it work correctly with hexadecimal entities. |
Hey @kinow, I hope I'm not too much impatient (sorry if that's the case ^^) but do you have news about this PR ? Thanks in advance ! |
Hi @rbunel35 Thanks for the remainder. I'm starting a new job in a few months, and will have about one month with extra spare time, when I expect to be able to release Text if nobody else beats me to it. I think we will have to move this question to a wider audience, @rbunel35 . We have two options here, I can send the email to the Commons Dev mailing list explaining what the PR is fixing, and trying to explain on the difference between browsers & specs, or if you are subscribed you can send the email over there. The only difference, I think, is that I may forget the issue again if there are no replies over there 😄 in which case you'd have to ping me again. Up to you 👍 Thanks!!! |
Thanks for the quick answer ! |
Brilliant! Every component is discussed there, that's the only downside. Use the following prefix for your subject, please: "[text] Enter your email subject here". I have a rule in GMail to move it to another folder so that I can take a look when I'm not busy. You can ignore emails for components you are not interested (I'm about to write one email for Commons Configuration, it goes to the same mailing list, with prefix [configuration]...). Hopefully we will find a solution for this issue and fix & release it soon. Thanks!!! |
Hello,
This a quick bugfix on the NumericEntityUnescaper. The bug allows decimal characters entities without semi-colon and followed by a letter from A to F to be ignored by the translator.
A full description of the problem is found in the ticket: https://issues.apache.org/jira/browse/TEXT-215