Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detection time window may not allow enough URL Metrics to be collected #1496

Closed
westonruter opened this issue Aug 22, 2024 · 5 comments · Fixed by #1641
Closed

Detection time window may not allow enough URL Metrics to be collected #1496

westonruter opened this issue Aug 22, 2024 · 5 comments · Fixed by #1641
Assignees
Labels
[Plugin] Optimization Detective Issues for the Optimization Detective plugin [Type] Bug An existing feature is broken

Comments

@westonruter
Copy link
Member

Originally in #1494 (comment).

I discovered on my blog that I had a very old URL Metric from before the XPath indices were fixed to be 1-based instead of 0-based (#1191). Looking at the URL Metric data, I see the timestamp is 1712354502.752218 which is for a day back in April. It appears that URL Metrics are not being collected frequently enough, or my site is truly devoid of traffic. I think for my site it may be in part due to page caching, where the detection time window (od_detection_time_window) is too short, so it is only collecting metrics for visitors who happen to visit when there is a MISS to the page cache, and if there is a stale-while-revalidate caching strategy going on with Varnish then it is even more likely for the visitor to never be within the detection time window (currentTime - serveTime > detectionTimeWindow):

// Abort running detection logic if it was served in a cached page.
if ( currentTime - serveTime > detectionTimeWindow ) {
if ( isDebug ) {
warn(
'Aborted detection due to being outside detection time window.'
);
}
return;
}

I think this logic needs to be revisited.

@felixarntz
Copy link
Member

Related is a conversation @westonruter and I had at WordCamp US: To reliably handle dynamic behavior on the client-side, the behavior needs to be controlled by the client-side, to account for full page caches which will lead to most requests not to reach the PHP logic. This basically means two things:

  • We need to inject the script, or at least a very minimal version of the script, unconditionally, because we can't know from the server-side whether the current request needs the detection to run (e.g. if it never hits the PHP process).
  • We need to retrieve data that is dynamic in another way, it is not reliable to inline it with the script. Some data that is specific to the current URL (e.g. urlMetricsSlug) or a general configuration setting (e.g. storageLockTTL) is probably okay to inline, but most other data (e.g. urlMetricsGroupStatuses) will get out of date when a full page cache is used. Such data would probably need to come from a REST endpoint instead.

Defining the right solution for this probably goes beyond this issue, maybe we should open a new one, that is more broadly about how to support full page caches in a way that the Optimization Detective functionality is still reliable? I think prioritizing this will be crucial to get broader adoption, as lots of sites use a full page cache which will affect the behavior negatively - sometimes minor, sometimes more severely, depending on the cache configuration.

@westonruter
Copy link
Member Author

Here is a table for what I've found the default max-age to be for caching plugins:

Plugin Name Default Max-Age
Batcache 5 minutes
Pantheon Advanced Page Cache v1 10 minutes
WP Super Cache 30 minutes
W3 Total Cache 1 hour
Jetpack Boost 1 hour
WP Rocket 10 hours
Pantheon Advanced Page Cache v2 1 week
LightSpeed Cache 1 week
WP Fastest Cache None. Requires Timeout rule config.

@westonruter
Copy link
Member Author

I've been tackling this problem in the following sequential PRs:

The first two remove what is specifically broken with page caching: nonces and the detection time window. The third begins to proactively address the problem of a page caching plugin holding onto an unoptimized cached page even after URL Metrics have been submitted by visitors and the corollary problem of the inline script data not being up-to-date.

By ensuring that page caching plugins flush the caches for cached URLs which have updated URL Metrics, then we can actually rely on the inline script to be relatively up-to-date on the needs for URL Metrics. We may not need to move the urlMetricsGroupStatuses data to a new REST API endpoint, which itself could be subject to page caches holding onto the data for too long and so it may not be fresh.

@westonruter
Copy link
Member Author

westonruter commented Nov 12, 2024

Ideas discussed with @felixarntz:

  • To deal with the page cache stampede problem, we can schedule the clearing of the cache for a URL periodically via WP Cron. (This is addressed in Send post ID of queried object or first post in loop in URL Metric storage request to schedule page cache validation #1641.)
  • Do we add an endpoint for getting the which URL Metrics are needed. The problem is that the endpoint cache wouldn't be reliably flushed. But if it were reliably flushed,
  • Check what happens when there is a race condition between two requests to store a URL Metric from two application servers connected to the same database when there wasn't a od_url_metrics post that already existed.
  • Do we need a lock or throttling mechanism?
  • When a persistent object cache is available, perhaps the POST request would only write to the object cache and then WP Cron would read from the object cache to actually create the URL Metrics posts.

I just looked at LightSpeed Cache and they purge caches in response to the transition_post_status action. Similar to Pantheon Advanced Page Cache with its surrogate keys, LightSpeed Cache constructs a list of purge tags which correspond to the list of tags constructed when a page is served. Notably there is a tag for post type archives. With such sophisticated surrogate keys employed by caching (in Varnish) it's safer to have longer cache TTLs since it is less likely stale content will persist. Pantheon Advanced Page Cache also accounts for post type archives.

@westonruter
Copy link
Member Author

I've filed #1655 to address the points raised in #1496 (comment).

@github-project-automation github-project-automation bot moved this from In Progress 🚧 to Done 😃 in WP Performance 2024 Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Plugin] Optimization Detective Issues for the Optimization Detective plugin [Type] Bug An existing feature is broken
Projects
Status: Done 😃
2 participants