-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in xml.dom.xmlbuilder.DOMBuilder.parse()
#128302
Comments
xml.dom.xmlbuilder.DOMBuilder.parse()
I'm going to tack on to this issue that I can fix that one too. I should probably write some tests for these at this point(?) Currently: cpython/Lib/xml/dom/xmlbuilder.py Lines 248 to 253 in aeb9b65
A version that works for python 3 is: def _guess_media_encoding(self, source):
info = source.byteStream.info()
if "Content-Type" in info:
for param in info.get_params([]):
if param[0] == 'charset':
return param[1].lower() |
For the record, I'm finding these obscure issues because I'm working on the type stubs for the xml module, and these popped out when I tried to run a type check on the source. |
Thank you very much for this thorough investigation! I'll try to have a look at your PR today or next week. |
Bug report
Bug description:
I believe there is a bug in
xml.dom.xmlbuilder.parse()
.xml.dom.xmlbuilder
is not documented by python, we don't have any tests forthem, and I can't find evidence on github of these classes being in use. That's
okay, because in this case the classes are implementations of a w3c standard
which was at the time a draft, and which is heavily documented.
First, some digital archeology. The first commit of
Lib/xml/dom/xmlbuilder.py
is from 2003-01-25, with the message "Import from PyXML 1.10."
The issue already exists in that commit.
The earlier remains of the PyXML project are on sourceforce
in a cvs repo. Looking at the log there, I find that the first commit for xmlbuilder.py
is from 2002-02-21 with this message:
The bug is already present in this version as well.
The file sees a few more commits, four of which reference updating to track
the new version of the draft standard, the latest of which is from 2002-08-02.
None of the commits since being imported into CPython reference the standard.
This means that in looking at the code, our best reference is the 2002-07-25 draft
version: https://www.w3.org/TR/2002/WD-DOM-Level-3-LS-20020725/.
What's called DOMBuilder in that draft and in our implementation, would in later
drafts be renamed to DOMParser and then LSParser, but the basic class remains
in the standard.
Going to the standard now:
The primary interface in the standard is DOMImplementationLS, a mixin for the
DOMImplementation
interface. The relevant methods for us are:Described as:
The documentation for DOMInputSource says:
Finally, documentation for the DOMBuilder interface:
And its parse method:
So an application using this interface is intended to:
and accomplished via
xml.dom.getDOMImplementation()
.createDOMBuilder()
andcreateDOMInputSource()
to getthose two respective objects.
canSetFeature
,getFeature
, andsetFeature
.DOMInput source will first prefer its
characterStream
attribute. If thatattribute is null, it will prefer the
byteStream
attribute. If both ofthese are null, then the URI from its
systemId
attribute is used.DOMInputSource
toDOMBuilder.parse()
to receive aDocument
object.In Python, this looks like this:
For a use case like this, with no additional customization of the source object,
the spec also has a
DOMBuilder.parseURI()
method which accepts a URI directly andconstructs the necessary DOMInputSource internally.
With all that as background, here's our current implementation of
DOMBuilder.parse()
:cpython/Lib/xml/dom/xmlbuilder.py
Lines 187 to 195 in aeb9b65
The problem is on line 192.
systemId
is not an attribute that exists onoptions
here.It's an instance of the
xml.dom.xmlbuilder.Options
class and represents thefeatures that can be configured on
DOMBuilder
. Those options are documented inthe specification, and are set using
DOMBuilder.setFeature()
. Our implementationhas a dictionary
DOMBuilder._settings
which contains the options from thespec that we support, which starts here:
cpython/Lib/xml/dom/xmlbuilder.py
Lines 101 to 103 in aeb9b65
The important thing is that
systemId
is not an configuration setting in eitherthe specificiation or in our implementation, so as written we can never reach
the if branch in that method.
I believe that the relevant lines should be:
(The implementation is also non-conformant because it fails to consider
input.characterStream
before usinginput.byteStream
, but that's a differentprobem. I'm not looking to add a missing feature right now, just fix the already-existing-but-broken one.)
CPython versions tested on:
3.13
Operating systems tested on:
No response
Linked PRs
xml.dom.xmlbuilder.DOMBuilder.parse()
#128284The text was updated successfully, but these errors were encountered: