-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize DOCTYPE? #858
Comments
Yes, let's normalize the DOCTYPE to Let's do it at the place where it the most simple. Officially, we do not support getting DOMDocuments from other sources anyway, so we might do it at any place. |
Simplest I imagine would to be to lowercase the |
Ensure that the DOCTYPE declaration consists of uppercase `DOCTYPE` and lowercase root element name (`html`). This is done when the `DOMDocument` is created from an HTML source. Once the `DOMDocument` has been created, the `DOMDocumentType` cannot be changed, so the document type declaration must be manipulated (if necessary) in the HTML beforehand. (Since only HTML documents are supported, the declaration is only normalized when the root element name is HTML, in whatever case - the precise specification for any element name involves lists of various Unicode character ranges which it would be superfluous to allow for and try to match. PHP's `DOMDocument`/`libxml` itself will output the `DOCTYPE` keyword in uppercase in any case.) This normalization is consistent with the relevant part of the [polyglot markup specification]( https://dev.w3.org/html5/html-polyglot/html-polyglot.html#doctype ). While polyglot markup is primarily intended for serialization of HTML as XML (we don't actually support outputting as XHTML), is also recommended for maximum interoperability and robustness when rendering HTML. This also makes the output consistent with that of `Masterminds/html5-php` and would eliminate the need to change associated tests specifically for #831. Closes #858.
Ensure that the DOCTYPE declaration consists of uppercase `DOCTYPE` and lowercase root element name (`html`). This is done when the `DOMDocument` is created from an HTML source. Once the `DOMDocument` has been created, the `DOMDocumentType` cannot be changed, so the document type declaration must be manipulated (if necessary) in the HTML beforehand. (Since only HTML documents are supported, the declaration is only normalized when the root element name is HTML, in whatever case - the precise specification for any element name involves lists of various Unicode character ranges which it would be superfluous to allow for and try to match. PHP's `DOMDocument`/`libxml` itself will output the `DOCTYPE` keyword in uppercase in any case.) This normalization is consistent with the relevant part of the [polyglot markup specification]( https://dev.w3.org/html5/html-polyglot/html-polyglot.html#doctype ). While polyglot markup is primarily intended for serialization of HTML as XML (we don't actually support outputting as XHTML), is also recommended for maximum interoperability and robustness when rendering HTML. This also makes the output consistent with that of `Masterminds/html5-php` and would eliminate the need to change associated tests specifically for #831. Closes #858.
Noted in #831 is that
masterminds/html5
will always output the HTML5 DOCTYPE as<!DOCTYPE html>
, i.e. uppercaseDOCTYPE
and lowercasehtml
.This is consistent with the Polyglot Markup recommendation for producers to maximize support.
For #831, we would, at least, need to change a test to expect the above DOCTYPE form in the output where the input is
<!DOCTYPE HTML>
(with uppercaseHTML
).But should we, in any case, normalize the DOCTYPE as above? I.e. always output
DOCTYPE
in uppercase (which we do anyway), andhtml
in lowercase? If so, should this be done when serializing (rendering) the DOM, parsing the HTML (when the DTD name is read into the DOMDocumentType::$name property), or both (or even possibly infromDomDocument
and/orgetDomDocument
)?The text was updated successfully, but these errors were encountered: