-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the loading of images using stored URL metrics #884
Changes from 43 commits
4758013
5fb6fda
bd1153d
a990982
b18a0c7
25ea308
c88f358
c20f967
08635fa
ec55c9a
6e0f809
e429983
8974ac8
acde9b4
024719d
38cb821
234e2fe
1585a78
292d1c4
fca7f8d
3cda887
f00addc
ce1db2f
d74420d
f0ee6b2
1e495db
8065801
131a9a0
ee83ba0
3d99128
d42601c
4db5161
2c3cf3a
3d7b5fa
957d2c4
e8a36b4
0cca4dd
83d0362
605fb02
74145a4
2758c0d
9d82615
9bfcdc9
185ebf6
1d259b5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,305 @@ | ||
<?php | ||
/** | ||
* Image Loading Optimization: ILO_HTML_Tag_Processor class | ||
* | ||
* @package performance-lab | ||
* @since n.e.x.t | ||
*/ | ||
|
||
/** | ||
* Processor leveraging WP_HTML_Tag_Processor which gathers breadcrumbs which can be queried while iterating the open_tags() generator. | ||
* | ||
* Eventually this class should be made largely obsolete once `WP_HTML_Processor` is fully implemented to support all HTML tags. | ||
* | ||
* @since n.e.x.t | ||
* @access private | ||
*/ | ||
final class ILO_HTML_Tag_Processor { | ||
|
||
/** | ||
* HTML void tags (i.e. those which are self-closing). | ||
* | ||
* @link https://html.spec.whatwg.org/multipage/syntax.html#void-elements | ||
* @see WP_HTML_Processor::is_void() | ||
* @todo Reuse `WP_HTML_Processor::is_void()` once WordPress 6.4 is the minimum-supported version. | ||
* | ||
* @var string[] | ||
*/ | ||
const VOID_TAGS = array( | ||
'AREA', | ||
'BASE', | ||
'BASEFONT', // Obsolete. | ||
'BGSOUND', // Obsolete. | ||
'BR', | ||
'COL', | ||
'EMBED', | ||
'FRAME', // Deprecated. | ||
'HR', | ||
'IMG', | ||
'INPUT', | ||
'KEYGEN', // Obsolete. | ||
'LINK', | ||
'META', | ||
'PARAM', // Deprecated. | ||
'SOURCE', | ||
'TRACK', | ||
'WBR', | ||
); | ||
|
||
/** | ||
* The set of HTML tags whose presence will implicitly close a <p> element. | ||
* For example '<p>foo<h1>bar</h1>' should parse the same as '<p>foo</p><h1>bar</h1>'. | ||
* | ||
* @link https://html.spec.whatwg.org/multipage/grouping-content.html#the-p-element | ||
* | ||
* @var string[] | ||
*/ | ||
const P_CLOSING_TAGS = array( | ||
'ADDRESS', | ||
'ARTICLE', | ||
'ASIDE', | ||
'BLOCKQUOTE', | ||
'DETAILS', | ||
'DIV', | ||
'DL', | ||
'FIELDSET', | ||
'FIGCAPTION', | ||
'FIGURE', | ||
'FOOTER', | ||
'FORM', | ||
'H1', | ||
'H2', | ||
'H3', | ||
'H4', | ||
'H5', | ||
'H6', | ||
'HEADER', | ||
'HGROUP', | ||
'HR', | ||
'MAIN', | ||
'MENU', | ||
'NAV', | ||
'OL', | ||
'P', | ||
'PRE', | ||
'SEARCH', | ||
'SECTION', | ||
'TABLE', | ||
'UL', | ||
westonruter marked this conversation as resolved.
Show resolved
Hide resolved
|
||
); | ||
|
||
/** | ||
* Open stack tags. | ||
* | ||
* @var string[] | ||
*/ | ||
private $open_stack_tags = array(); | ||
|
||
/** | ||
* Open stag indices. | ||
* | ||
* @var int[] | ||
*/ | ||
private $open_stack_indices = array(); | ||
|
||
/** | ||
* Processor. | ||
* | ||
* @var WP_HTML_Tag_Processor | ||
*/ | ||
private $processor; | ||
|
||
/** | ||
* Constructor. | ||
* | ||
* @param string $html HTML to process. | ||
*/ | ||
public function __construct( string $html ) { | ||
$this->processor = new WP_HTML_Tag_Processor( $html ); | ||
} | ||
|
||
/** | ||
* Gets all open tags in the document. | ||
* | ||
* A generator is used so that when iterating at a specific tag, additional information about the tag at that point | ||
* can be queried from the class. Similarly, mutations may be performed when iterating at an open tag. | ||
* | ||
* @return Generator<string> Tag name of current open tag. | ||
*/ | ||
public function open_tags(): Generator { | ||
$p = $this->processor; | ||
|
||
/* | ||
* The keys for the following two arrays correspond to each other. Given the following document: | ||
* | ||
* <html> | ||
* <head> | ||
* </head> | ||
* <body> | ||
* <p>Hello!</p> | ||
* <img src="lcp.png"> | ||
* </body> | ||
* </html> | ||
* | ||
* Upon processing the IMG element, the two arrays should be equal to the following: | ||
* | ||
* $open_stack_tags = array( 'HTML', 'BODY', 'IMG' ); | ||
* $open_stack_indices = array( 0, 1, 1 ); | ||
*/ | ||
$this->open_stack_tags = array(); | ||
$this->open_stack_indices = array(); | ||
while ( $p->next_tag( array( 'tag_closers' => 'visit' ) ) ) { | ||
$tag_name = $p->get_tag(); | ||
if ( ! $p->is_tag_closer() ) { | ||
|
||
// Close an open P tag when a P-closing tag is encountered. | ||
// TODO: There are quite a few more cases of optional closing tags: https://html.spec.whatwg.org/multipage/syntax.html#optional-tags | ||
// Nevertheless, given WordPress's legacy of XHTML compatibility, the lack of closing tags may not be common enough to warrant worrying about any of them. | ||
if ( in_array( $tag_name, self::P_CLOSING_TAGS, true ) ) { | ||
$i = array_search( 'P', $this->open_stack_tags, true ); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is probably fine enough, but be aware that there are certain markers/boundaries in the stack that effectively isolate regions of HTML. for example, if we're currently inside a TEMPLATE element and encounter one of these P-closing elements, they don't close the P from the outside of the template. <p>
This template is isolated.
<template><section><p>Hidden</p></section></template>
The outer P remains.
</p> |
||
if ( false !== $i ) { | ||
array_splice( $this->open_stack_tags, $i ); | ||
array_splice( $this->open_stack_indices, count( $this->open_stack_tags ) ); | ||
} | ||
} | ||
|
||
$level = count( $this->open_stack_tags ); | ||
$this->open_stack_tags[] = $tag_name; | ||
|
||
if ( ! isset( $this->open_stack_indices[ $level ] ) ) { | ||
$this->open_stack_indices[ $level ] = 0; | ||
} else { | ||
++$this->open_stack_indices[ $level ]; | ||
} | ||
|
||
// Only increment the tag index at this level only if it isn't the admin bar, since the presence of the | ||
// admin bar can throw off the indices. | ||
if ( 'DIV' === $tag_name && $p->get_attribute( 'id' ) === 'wpadminbar' ) { | ||
--$this->open_stack_indices[ $level ]; | ||
} | ||
|
||
// Now that the breadcrumbs are constructed, yield the tag name so that they can be queried if desired. | ||
// Other mutations may be performed to the open tag's attributes by the callee at this point as well. | ||
yield $tag_name; | ||
|
||
// Immediately pop off self-closing tags. | ||
if ( in_array( $tag_name, self::VOID_TAGS, true ) ) { | ||
array_pop( $this->open_stack_tags ); | ||
} | ||
} else { | ||
// If the closing tag is for self-closing tag, we ignore it since it was already handled above. | ||
if ( in_array( $tag_name, self::VOID_TAGS, true ) ) { | ||
continue; | ||
} | ||
|
||
// Since SVG and MathML can have a lot more self-closing/empty tags, potentially pop off the stack until getting to the open tag. | ||
$did_splice = false; | ||
if ( 'SVG' === $tag_name || 'MATH' === $tag_name ) { | ||
$i = array_search( $tag_name, $this->open_stack_tags, true ); | ||
if ( false !== $i ) { | ||
array_splice( $this->open_stack_tags, $i ); | ||
$did_splice = true; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if encountering an element which isn't an HTML Void element, then regardless of the tag, if the self-closing flag is present, then it closes the tag. all self-closing flags on HTML elements are invalid and ignored, but all on HTML foreign elements are authoritative and obeyed. <p>There are no empty <div/> tags. This is inside the DIV</p>.
<wp-group>Is a custom element, and so <wp-group /> is self-closing while others aren't.</wp-group> if (
WP_HTML_Processor::is_void( $tag_name ) ||
(
! WP_HTML_Processor::is_html_element( $tag_name ) &&
$p->has_self_closing_flag
)
) {
// this tag immediately closes as soon as we jump to the next tag.
} this requires having a list of all HTML elements, which the HTML Processor currently doesn't do because it doesn't support any, but we'll have to add it. this can lead to common parsing failures because the invalid self-closing flag has become more popular post-React where it's valid and normative in JSX. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Very interesting. I wasn't aware of the Nevertheless, maybe having a list of all HTML elements isn't needed here because: (1) if Nevertheless, I just checked your example and it doesn't seem the second example with a custom element is actually true. I adapted your example to use a <p>There are no empty <span/> tags. This is inside the SPAN</p>.
<wp-group>Is a custom element, and so <wp-group /> is self-closing while others aren't.</wp-group> And Chrome renders this as: So perhaps such self-closing foreign elements are limited to MathML and SVG contexts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. by golly, you're right. I have been misinterpreting "foreign element" for a long time now! thankfully I haven't gotten to the place in the HTML Processor where that matters. thank you very much for pointing this out. |
||
} | ||
} | ||
|
||
if ( ! $did_splice ) { | ||
$popped_tag_name = array_pop( $this->open_stack_tags ); | ||
if ( $popped_tag_name !== $tag_name && function_exists( 'wp_trigger_error' ) ) { | ||
wp_trigger_error( | ||
__METHOD__, | ||
esc_html( | ||
sprintf( | ||
/* translators: 1: Popped tag name, 2: Closing tag name */ | ||
__( 'Expected popped tag stack element %1$s to match the currently visited closing tag %2$s.', 'performance-lab' ), | ||
$popped_tag_name, | ||
$tag_name | ||
) | ||
) | ||
); | ||
} | ||
} | ||
array_splice( $this->open_stack_indices, count( $this->open_stack_tags ) + 1 ); | ||
} | ||
} | ||
} | ||
|
||
/** | ||
* Gets breadcrumbs for the current open tag. | ||
* | ||
* Breadcrumbs are constructed to match the format from detect.js. | ||
* | ||
* TODO: Consider rather each breadcrumb being a (tag,index) tuple. | ||
* | ||
* @return array<array{tag: string, index: int}> Breadcrumbs. | ||
*/ | ||
public function get_breadcrumbs(): array { | ||
$breadcrumbs = array(); | ||
foreach ( $this->open_stack_tags as $i => $breadcrumb_tag_name ) { | ||
$breadcrumbs[] = array( | ||
'tag' => $breadcrumb_tag_name, | ||
'index' => $this->open_stack_indices[ $i ], | ||
); | ||
} | ||
return $breadcrumbs; | ||
} | ||
|
||
/** | ||
* Returns the value of a requested attribute from a matched tag opener if that attribute exists. | ||
* | ||
* This is a wrapper around the underlying HTML_Tag_Processor method of the same name since only a limited number of | ||
* methods can be exposed to prevent moving the pointer in such a way as the breadcrumb calculation is invalidated. | ||
* | ||
* @see WP_HTML_Tag_Processor::get_attribute() | ||
* | ||
* @param string $name Name of attribute whose value is requested. | ||
* @return string|true|null Value of attribute or `null` if not available. Boolean attributes return `true`. | ||
*/ | ||
public function get_attribute( string $name ) { | ||
return $this->processor->get_attribute( $name ); | ||
} | ||
|
||
/** | ||
* Updates or creates a new attribute on the currently matched tag with the passed value. | ||
* | ||
* This is a wrapper around the underlying HTML_Tag_Processor method of the same name since only a limited number of | ||
* methods can be exposed to prevent moving the pointer in such a way as the breadcrumb calculation is invalidated. | ||
* | ||
* @see WP_HTML_Tag_Processor::set_attribute() | ||
* | ||
* @param string $name The attribute name to target. | ||
* @param string|bool $value The new attribute value. | ||
* @return bool Whether an attribute value was set. | ||
*/ | ||
public function set_attribute( string $name, $value ): bool { | ||
return $this->processor->set_attribute( $name, $value ); | ||
} | ||
|
||
/** | ||
* Removes an attribute from the currently-matched tag. | ||
* | ||
* This is a wrapper around the underlying HTML_Tag_Processor method of the same name since only a limited number of | ||
* methods can be exposed to prevent moving the pointer in such a way as the breadcrumb calculation is invalidated. | ||
* | ||
* @see WP_HTML_Tag_Processor::remove_attribute() | ||
* | ||
* @param string $name The attribute name to remove. | ||
* @return bool Whether an attribute was removed. | ||
*/ | ||
public function remove_attribute( string $name ): bool { | ||
return $this->processor->remove_attribute( $name ); | ||
} | ||
|
||
/** | ||
* Returns the string representation of the HTML Tag Processor. | ||
* | ||
* This is a wrapper around the underlying HTML_Tag_Processor method of the same name since only a limited number of | ||
* methods can be exposed to prevent moving the pointer in such a way as the breadcrumb calculation is invalidated. | ||
* | ||
* @see WP_HTML_Tag_Processor::get_updated_html() | ||
* | ||
* @return string The processed HTML. | ||
*/ | ||
public function get_updated_html(): string { | ||
return $this->processor->get_updated_html(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need these lists here? What specifically in regards to void and p closing tags is needed here, and why? Is there a bug in the
WP_HTML_Tag_Processor
, or crucial limitation?It would be great if you could clarify the need for all this in relation to this module. Right now I'm wondering whether this module should need all those functionalities - like why are those special tags relevant for image optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are needed to be able to iterate over tags in the document and be able to correctly keep track of the open tag stack. The
WP_HTML_Tag_Processor
is intentionally very limited in what processing it supports. Namely, it does not have any sense of how tags are related to each other (e.g. thathtml
is a parent ofbody
). It is a streaming parser that iterates over open tags and close tags. In order to keep track of when one tag is a child of another tag, we have to know the semantics of these tags, whether they are self-closing (void) or how they relate to tags that have optional closers (e.g. herep
).The superset of functionality from
WP_HTML_Tag_Processor
is inWP_HTML_Processor
which does have capabilities regarding document structure. However, it can't be used yet because it doesn't support all HTML tags yet. Eventually this class shouldn't be needed asWP_HTML_Processor
. I've added a comment to reflect this in 0cca4dd.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yeah, these lists of tags are all related to image optimization in that we need to know them in order to accurately construct breadcrumbs for any given tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that makes sense. I wonder how this version of the tag processor is for performance.
For the plugin we don't need to worry about this too much, but for a potential core proposal we would need to spend more time on this - particularly as I'm not sure we would want two separate versions of the tag processor in core. And we will only be able to use this one throughout if it doesn't have notable negative performance implications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should have about the same performance as
WP_HTML_Tag_Processor
as it is just iterating over tags and is doing much less thanWP_HTML_Processor
, which is less predictable for performance.