Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the loading of images using stored URL metrics #884

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Nov 16, 2023

Summary

Fixes #872 as part of #869.

This implements the optimization of image loading based on the client-side detection (#876) that has been stored (#878).

This is fully functional and able to be tested in a WP install.

Relevant technical choices

When a response is eligible for being optimized, an output buffer filter is added to apply optimizations. This involves looking up the ilo_url_metrics post containing the stored URL metrics, and when present groupong the URL metrics into the breakpoints (480px, 600px, 782px). If all breakpoints have URL metrics collected and all share the same image as LCP element, then fetchpriority=high is set on this image while being removed from all other images. If all breakpoints don't have URL metrics collected yet, then the fetchpriority attribute added by server-side heuristics is not removed. Certain data attributes are also added to elements so we can track what operations are being performed:

Attribute Condition
data-ilo-added-fetchpriority When the fetchpriority attribute is added to an element.
data-ilo-removed-fetchpriority When the fetchpriority attribute is removed from an element.
data-ilo-fetchpriority-already-added When WordPress's existing server-side heuristics have already set fetchpriority=high on the LCP image, then the data-ilo-removed-fetchpriority attribute is added.
data-ilo-added-tag Present on the preload links which are injected at the end of the head.

When the fetchpriority attribute is added, a data-ilo-added-fetchpriority attribute is also added so we can keep track of what operations it is doing.

Whenever there are URL metrics available, preload links are also added for the LCP image in each breakpoint. This is particularly important when different breakpoints have different LCP elements (even none at all), as in such cases the fetchpriority attribute must be removed from each breakpoint-specific LCP image element. This is also useful for when breakpoints don't have URL metrics collected yet, as otherwise fetchpriority=high could be erroneously added to the wrong image for the current viewport.

In order to apply the fetchpriority optimizations to the page and to collect the attributes from IMG elements for use in preload links, the WP_HTML_Tag_Processor class is leveraged, although it is not used directly. An ILO_HTML_Tag_Processor class is used which incorporates WP_HTML_Tag_Processor (by composition instead of extension). This processor is similar to WP_HTML_Tag_Processor except it is more constrained. It provides a generator method called open_tags(); while iterating over this generator, the breadcrumbs for the current open tag can be obtained via the get_breadcrumbs() method. Otherwise, the following methods from the underlying WP_HTML_Tag_Processor are exposed:

  • get_attribute()
  • set_attribute()
  • remove_attribute()
  • get_updated_html()

Importantly, the next_tag() and seek() methods are not exposed since in order to correctly compute breadcrumbs the processor must proceed forward through the entire document visiting all open and closing tags.

Tip: The following command deletes all ilo_url_metrics posts:

npm run wp-env run cli -- post delete --force $( npm run --silent wp-env run cli -- post list --post_type=ilo_url_metrics --format=ids 2> /dev/null )

Other changes

  • Besides the existing 480px breakpoint, additional breakpoints of 600px and 782px have been added. These three breakpoints correspond to the mobile, small, and medium breakpoints that are defined in Gutenberg's _breakpoints.css.
  • Detection and optimization by default do not run now in the Customizer preview or in response to a non-GET request (e.g. a POST request).
  • Breadcrumb calculation has been updated to disregard the presence of the Admin Bar. Similarly, client-side detection logic has been updated to account for JS injection of the .skip-link.screen-reader-text element, which throws off generation of breadcrumbs on the client to match breadcrumbs generated on the server. In reality, the generation of breadcrumbs client-side is fragile and in a subsequent PR I intend to rely on server-side breadcrumb generation exclusively.
  • Some additional preflab_ prefixes have been replaced with ilo_.
  • The tagName key has been replaced with tag in breadcrumbs.

Checklist

  • PR has either [Focus] or Infrastructure label.
  • PR has a [Type] label.
  • PR has a milestone or the no milestone label.

@westonruter westonruter added [Type] Feature A new feature within an existing module [Focus] Images no milestone PRs that do not have a defined milestone for release labels Nov 16, 2023
@westonruter westonruter added the [Plugin] Optimization Detective Issues for the Optimization Detective plugin label Nov 17, 2023
@westonruter
Copy link
Member Author

westonruter commented Nov 18, 2023

This is now working!

  • When there is only one LCP image shared for all breakpoints, it adds fetchpriority=high to that element and removes it from all others.
  • When breakpoints have different LCP images (or some have none), fetchpriority is removed from all images and preload links are added to target the breakpoints.

Elements that have been muted get annotated with data-ilo-* attributes to easily track what it is doing.

Items to further work on:

  • Create a subclass of WP_HTML_Tag_Processor instead of using the ilo_walk_document() function.
  • Disable detection when in the Customizer preview due to the injection of inline editing controls.
  • Disable detection and optimization when $_SERVER['REQUEST_METHOD'] !== 'GET'.

For a future PR:

  • Improve handling of breadcrumb generation when there the document structure varies either due to admin bar being present or due to client-side logic. Currently breadcrumbs have to explicitly skip over the admin bar element to ignore it. Similarly, the client-side breadcrumb logic has to skip over .skip-link.screen-reader-text which is injected by wp_enqueue_block_template_skip_link(). This issue of client-side mutated elements is particularly problematic, which may require server-side tagging of elements with their breadcrumbs which would eliminate needing to also do that calculation on the client. But this would mean extraneous data- attributes on all images. When detection isn't needed, the tag processor could strip those attributes out before sending back the buffer. We may need to monitor this to see how much of a problem it is, such as by checking various themes to see what degree they inject elements to mess up breadcrumbs. As another alternative, we may consider relying on IDs and class names for breadcrumbs rather than element indices, although JS here too can cause problems by modifying the class names.
  • Go ahead and inject a preload link even when there is only one LCP image shared by all breakpoints?
  • What about LCP images that are injected with JavaScript, e.g. a dynamically-constructed carousel? The element will not be in the DOM for adding fetchpriority=high to. However detection could capture the image attributes (e.g. src and srcset) to then add a preload link. The challenge here is that there's no way to guard against preloading an image that JS no longer is showing. This isn't a problem for server-side processing because we don't store the src but rather just the breadcrumbs, so we validate the LCP element is still in the page.
  • Should detection be disabled when the user is logged-in (and the admin bar is shown)?

}
if (
children.includes(
document.querySelector( '.skip-link.screen-reader-text' )
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be cached.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave this as-is for this PR. In the next PR I intend to eliminate client-side breadcrumb construction since it is too easy for JavaScript DOM mutations to cause client-side generated breadcrumbs from being able to apply on the server.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you switch to generate breadcrumbs on the server, maybe a shorter index or hash passed to the front end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. Currently I'm thinking that when detection is needed that the server would add breadcrumbs via a data attribute, for example:

<img data-ilo-breadcrumbs="html,0 body,1 main,2 figure,10 img,0" ...>

When detection is not needed, such data attributes would not be present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a TODO comment? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already one:

* @todo Eliminate this in favor of doing all breadcrumb generation exclusively on the server.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the next PR: #892

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter there's quite a bit in this PR! I can't say that I fully understanding it, but as far as the HTML Processor goes I'm glad you're experimenting with the lower-level tooling for your own needs. plugins are the right place to push ahead with riskier code.

it hasn't been my intention to expose which nth-child a node is compared to the parent, but I was going to build a CSS selector that queries that when prompted. maybe that's a good thing to build in from the start.

my outlook has been positive on the performance side for the eventual HTML Processor, but it's still opaque until it supports everything it needs to 🤷‍♂️

// TODO: There are quite a few more cases of optional closing tags: https://html.spec.whatwg.org/multipage/syntax.html#optional-tags
// Nevertheless, given WordPress's legacy of XHTML compatibility, the lack of closing tags may not be common enough to warrant worrying about any of them.
if ( in_array( $tag_name, self::P_CLOSING_TAGS, true ) ) {
$i = array_search( 'P', $this->open_stack_tags, true );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably fine enough, but be aware that there are certain markers/boundaries in the stack that effectively isolate regions of HTML. for example, if we're currently inside a TEMPLATE element and encounter one of these P-closing elements, they don't close the P from the outside of the template.

<p>
	This template is isolated.
	<template><section><p>Hidden</p></section></template>
	The outer P remains.
</p>

$i = array_search( $tag_name, $this->open_stack_tags, true );
if ( false !== $i ) {
array_splice( $this->open_stack_tags, $i );
$did_splice = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if encountering an element which isn't an HTML Void element, then regardless of the tag, if the self-closing flag is present, then it closes the tag. all self-closing flags on HTML elements are invalid and ignored, but all on HTML foreign elements are authoritative and obeyed.

<p>There are no empty <div/> tags. This is inside the DIV</p>.
<wp-group>Is a custom element, and so <wp-group /> is self-closing while others aren't.</wp-group>
if (
	WP_HTML_Processor::is_void( $tag_name ) || 
	(
		! WP_HTML_Processor::is_html_element( $tag_name ) &&
		$p->has_self_closing_flag
	)
) {
	// this tag immediately closes as soon as we jump to the next tag.
}

this requires having a list of all HTML elements, which the HTML Processor currently doesn't do because it doesn't support any, but we'll have to add it. this can lead to common parsing failures because the invalid self-closing flag has become more popular post-React where it's valid and normative in JSX.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting. I wasn't aware of the WP_HTML_Processor::has_self_closing_flag() method.

Nevertheless, maybe having a list of all HTML elements isn't needed here because: (1) if math or svg are ancestors, we can assume that all tags with self-closing flags will close the tag, and (2) custom elements always have hyphens in them, so if present we can also honor the self-closing tag.

Nevertheless, I just checked your example and it doesn't seem the second example with a custom element is actually true. I adapted your example to use a span instead of div (since a div closes an open p):

<p>There are no empty <span/> tags. This is inside the SPAN</p>.
<wp-group>Is a custom element, and so <wp-group /> is self-closing while others aren't.</wp-group>

And Chrome renders this as:

image

So perhaps such self-closing foreign elements are limited to MathML and SVG contexts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:

<svg>Is a custom element, and so <g /> is self-closing while others aren't.</svg>

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by golly, you're right. I have been misinterpreting "foreign element" for a long time now! thankfully I haven't gotten to the place in the HTML Processor where that matters.

thank you very much for pointing this out.


$link_tag .= sprintf( ' %s="%s"', $name, esc_attr( $value ) );
}
$link_tag .= ">\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also a reasonable place for the Tag Processor, though obviously not essential. it simply removes the need to remember to escape, and joins together the split tag/html/php syntax.

$link_tag = new WP_HTML_Tag_Processor( '<link data-ilo-added-tag rel="preload" fetchpriority="high" as="image">' );
foreach ( array_filter( $img_attributes ) as $name => $value ) {
	switch ( $name ) {
		case 'srcset':
		case 'sizes':
			$name = "image{$name}";
			break;

		case 'src':
			$name = 'href';
			break;	
	}

	$link_tag->set_attribute( $name, $value );
}
$link_html = $link_tag->get_updated_html();

I'm hoping to have basic templating in place for 6.5, which should turn this into something potentially nicer still.

$attributes = [];
foreach ( array_filter( $img_attributes ) as $name => value ) {
	switch ( $name ) {
		case 'srcset':
		case 'sizes':
			$attributes[ "image{$name}" ] = $value;
			continue;

		case 'src':
			$attributes['href'] = $value;
			continue;
	}

	$attributes[ $name ] = $value;
}

$link_tag = WP_HTML::render(
	'<link data-ilo-added-tag rel="preload" fetchpriority="high" as="image" ...attrs>',
	array( 'attrs' => $attributes )
);

Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter This looks great so far. I left a few comments, mostly for minor optimization. My biggest concern/question is in regards to the custom tag processor class, I'm not sure why that needs to be included here.

Comment on lines +26 to +58
const VOID_TAGS = array(
'AREA',
'BASE',
'BASEFONT', // Obsolete.
'BGSOUND', // Obsolete.
'BR',
'COL',
'EMBED',
'FRAME', // Deprecated.
'HR',
'IMG',
'INPUT',
'KEYGEN', // Obsolete.
'LINK',
'META',
'PARAM', // Deprecated.
'SOURCE',
'TRACK',
'WBR',
);

/**
* The set of HTML tags whose presence will implicitly close a <p> element.
* For example '<p>foo<h1>bar</h1>' should parse the same as '<p>foo</p><h1>bar</h1>'.
*
* @link https://html.spec.whatwg.org/multipage/grouping-content.html#the-p-element
*
* @var string[]
*/
const P_CLOSING_TAGS = array(
'ADDRESS',
'ARTICLE',
'ASIDE',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these lists here? What specifically in regards to void and p closing tags is needed here, and why? Is there a bug in the WP_HTML_Tag_Processor, or crucial limitation?

It would be great if you could clarify the need for all this in relation to this module. Right now I'm wondering whether this module should need all those functionalities - like why are those special tags relevant for image optimization?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are needed to be able to iterate over tags in the document and be able to correctly keep track of the open tag stack. The WP_HTML_Tag_Processor is intentionally very limited in what processing it supports. Namely, it does not have any sense of how tags are related to each other (e.g. that html is a parent of body). It is a streaming parser that iterates over open tags and close tags. In order to keep track of when one tag is a child of another tag, we have to know the semantics of these tags, whether they are self-closing (void) or how they relate to tags that have optional closers (e.g. here p).

The superset of functionality from WP_HTML_Tag_Processor is in WP_HTML_Processor which does have capabilities regarding document structure. However, it can't be used yet because it doesn't support all HTML tags yet. Eventually this class shouldn't be needed as WP_HTML_Processor. I've added a comment to reflect this in 0cca4dd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yeah, these lists of tags are all related to image optimization in that we need to know them in order to accurately construct breadcrumbs for any given tag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense. I wonder how this version of the tag processor is for performance.

For the plugin we don't need to worry about this too much, but for a potential core proposal we would need to spend more time on this - particularly as I'm not sure we would want two separate versions of the tag processor in core. And we will only be able to use this one throughout if it doesn't have notable negative performance implications.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should have about the same performance as WP_HTML_Tag_Processor as it is just iterating over tags and is doing much less than WP_HTML_Processor, which is less predictable for performance.

}
if (
children.includes(
document.querySelector( '.skip-link.screen-reader-text' )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a TODO comment? :)

modules/images/image-loading-optimization/optimization.php Outdated Show resolved Hide resolved
modules/images/image-loading-optimization/optimization.php Outdated Show resolved Hide resolved
modules/images/image-loading-optimization/optimization.php Outdated Show resolved Hide resolved
modules/images/image-loading-optimization/optimization.php Outdated Show resolved Hide resolved
@westonruter
Copy link
Member Author

My biggest concern/question is in regards to the custom tag processor class, I'm not sure why that needs to be included here.

@felixarntz The custom tag processor is needed in order to be able to obtain breadcrumbs for any given element. Without it, we wouldn't be able to apply optimizations for elements identified during detection.

Comment on lines +91 to +96
function ilo_construct_breadcrumbs_string( array $breadcrumbs ): string {
return implode(
' ',
array_map(
static function ( $breadcrumb ) {
return sprintf( '%s,%s', $breadcrumb['tag'], $breadcrumb['index'] );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another PR I'm working on, I'm going to make this into an XPath constructor.

Suggested change
function ilo_construct_breadcrumbs_string( array $breadcrumbs ): string {
return implode(
' ',
array_map(
static function ( $breadcrumb ) {
return sprintf( '%s,%s', $breadcrumb['tag'], $breadcrumb['index'] );
function ilo_construct_breadcrumbs_xpath( array $breadcrumbs ): string {
return implode(
'',
array_map(
static function ( $breadcrumb ) {
return sprintf( '/%s[%d]', $breadcrumb['tag'], $breadcrumb['index'] );

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #892

* @param array<array{tag: string, index: int}> $breadcrumbs Breadcrumbs.
* @return string Breadcrumb string.
*/
function ilo_construct_breadcrumbs_string( array $breadcrumbs ): string {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function moved to ILO_HTML_Tag_Processor method in #892

Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter Thanks for the updates, LGTM!

@westonruter
Copy link
Member Author

It would be nice if pull requests that aren't targeting trunk or a release branch could be merged with only a single reviewer. Getting two reviews for every review regardless seems like a bottleneck.

@westonruter
Copy link
Member Author

@adamsilverstein or @swissspidy could either of you drop in a LGTM?

@adamsilverstein adamsilverstein self-requested a review December 7, 2023 18:57
Copy link
Member

@adamsilverstein adamsilverstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tremendous

@westonruter westonruter merged commit d88cc75 into feature/image-loading-optimization Dec 7, 2023
8 checks passed
@westonruter westonruter deleted the add/image-loading-optimization-optimizing branch December 7, 2023 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no milestone PRs that do not have a defined milestone for release [Plugin] Optimization Detective Issues for the Optimization Detective plugin [Type] Feature A new feature within an existing module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants