Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assets + stylesheet assets #1475

Open
wants to merge 176 commits into
base: master
Choose a base branch
from

Conversation

eoghanmurray
Copy link
Contributor

@eoghanmurray eoghanmurray commented May 14, 2024

Medium to large enhancement to rrweb, building upon #1239

Please review #1437 first as this also builds upon that PR (which can be merged before #1239)


Asset Events

Assets are a new type of event that embody a serialized version of a http resource captured during snapshotting. Some examples are images, media files and stylesheets. Resources can be fetched externally (from cache) in the case of a href, or internally for blob: urls and same-origin stylesheets. Asset events are emitted subsequent to either a FullSnapshot or an IncrementalSnapshot (mutation), and although they may have a later timestamp, during replay they are rebuilt as part of the snapshot that they are associated with. In the case where e.g. a stylesheet is referenced at the time of a FullSnapshot, but hasn't been downloaded yet, there can be a subsequent mutation event with a later timestamp which, along with the asset event, can recreate the experience of a network-delayed load of the stylesheet.

Assets to mitigate stylesheet processing cost

In the case of stylesheets, rrweb does some record-time processing in order to serialize the css rules which had a negative effect on the initial page loading times and how quickly the FullSnapshot was taken (see https://pagespeed.web.dev/). These are now taken out of the main thread and processed asynchronously to be emitted (up to processStylesheetsWithin ms) later. There is no corresponding delay on the replay side so long as the stylesheet has been successfully emitted.

Asset Capture Configuration

The captureAssets configuration option allows you to customize the asset capture process. It is an object with the following properties:

  • objectURLs (default: true): This property specifies whether to capture same-origin blob: assets using object URLs. Object URLs are created using the URL.createObjectURL() method. Setting objectURLs to true enables the capture of object URLs.

  • origins (default: false): This property determines which origins to capture assets from. It can have the following values:

    • false or []: Disables capturing any assets apart from object URLs, stylesheets (unless set to false) and images (if that setting is turned on).
    • true: Captures assets from all origins.
    • [origin1, origin2, ...]: Captures assets only from the specified origins. For example, origins: ['https://s3.example.com/'] captures all assets from the origin https://s3.example.com/.
  • images (default: false or true if inlineImages is true in rrweb.record config): When set, this option turns on asset capturing for all images irrespective of their origin. Unless this configuration option is explicitly set to false, images may still be captured if their src url matches the origins setting above.

  • stylesheets (default: 'without-fetch'): When set to true, this turns on capturing of all stylesheets and style elements via the asset system irrespective of origin. The default of 'without-fetch' is designed to match with the previous inlineStylesheet behaviour, whereas the true value allows capturing of stylesheets which are otherwise inaccessible due to CORS restrictions to be captured via a fetch call, which will normally use the browser cache. Unless this is explicitly set to false, a stylesheet will be captured if it matches via the origins config above.

  • stylesheetsRuleThreshold (default: 0): only invoke the asset system for stylesheets with more than this number of rules. Defaults to zero (rather than say 100) as it only looks at the 'outer' rules (e.g. could have a single media rule which nests 1000s of sub rules). This default may be increased based on feedback.

  • processStylesheetsWithin (default: 2000): This property defines the maximum time in milliseconds that the browser should delay before processing stylesheets. Inline <style> elements will be processed within half this value. Lower this value if you wish to improve the odds that short 'bounce' visits will emit the asset before visitor unloads page. Set to zero or a negative number to process stylesheets synchronously, which can cause poor scores on e.g. https://pagespeed.web.dev/ ("Third-party code blocked the main thread").

Copy link

changeset-bot bot commented May 14, 2024

🦋 Changeset detected

Latest commit: 2641cde

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 19 packages
Name Type
rrweb-snapshot Major
rrweb Major
rrdom Major
@rrweb/types Major
@rrweb/rrweb-plugin-canvas-webrtc-record Major
@rrweb/rrweb-plugin-canvas-webrtc-replay Major
@rrweb/rrweb-plugin-console-record Major
@rrweb/rrweb-plugin-console-replay Major
@rrweb/rrweb-plugin-sequential-id-record Major
@rrweb/rrweb-plugin-sequential-id-replay Major
rrdom-nodejs Major
rrweb-player Major
@rrweb/all Major
@rrweb/replay Major
@rrweb/record Major
@rrweb/packer Major
@rrweb/utils Major
@rrweb/web-extension Major
rrvideo Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

) {
let cssText = stringifyCssRules(styleRules);
if (cssText) {
cssText = absoluteToStylesheet(cssText, sheetBaseHref);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a style tag that imports a stylesheet that has a import rule. The href here would be incorrect for any urls imported in the nested stylesheet.

In the example below I would expect absoluteToStylesheet to return something like url(\"https://local.pendo.io:8081/browser-tests/temp-resources/Vorname.otf\")

but instead I get something like url(\"https://local.pendo.io:8081/browser-tests/guides/Vorname.ttf\")
Screenshot 2024-07-11 at 1 43 34 PM
Screenshot 2024-07-11 at 1 42 29 PM
Screenshot 2024-07-11 at 1 42 58 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's a great catch; would it be possible to port over a simplified version of that test?

I can't quite figure out whether this was an issue before (and I haven't come across the code to import nested stylesheets during this PR — if you know where that is, please let me know!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging a little more, I honestly think this is an existing issue. From my understanding when we call absoluteToStylesheet we make the assumption that any URLs inside of a style tag must share the same filepath as the document the style tag is on. sheetBaseHref in the PR here, getHref() on master.

Unless you'd like to handle it here, I would be more than happy to open up a bug to continue digging into this and possibly even take a crack at fixing this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I apologize, I misspoke. I still think the issue has been around for awhile, but is surfaced here because we started checking all style tags instead of just empty ones. Just wanted to clarify!

Screenshot 2024-07-11 at 2 57 09 PM Screenshot 2024-07-11 at 2 58 40 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've always been inlining <style> tags, just that previously the serialization happened in serializeTextNode, whereas this PR consolidates all the logic up to the serializeElementNode level.

OK so I believe this all happens because we can recurse into rule.styleSheet when we encounter an @import rule ... but we call absoluteToStylesheet with a single outer href after the stringification.

I reckon this is a long standing issue, so best to open a new issue, or submit a test case as a new PR independently of this one to see if it fails on current master.

It'd be easier (for me) to fix it after merging this PR due to the changes to the code (e.g. stringifyStylesheet has been renamed to stringifyCssRules) however we can do the fix in both branches if this one takes too long to merge.

If I've identified the problem correctly, the fix would be to pass down a current baseHref to stringifyCssRules, so that it does the url() rewrite during stringification, and also so that we can change the href as we recurse through the tree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you nailed and that we're on the same page here. Making that change I'm getting the correct expected response. I've opened up a PR, but still need to create a proper test case.

eoghanmurray added a commit that referenced this pull request Aug 6, 2024
Support a contrived/rare case where a <style> element has multiple text node children (this is usually only possible to recreate via javascript append) ... this PR fixes cases where there are subsequent text mutations to these nodes; previously these would have been lost

* In this scenario, a new CSS comment may now be inserted into the captured `_cssText` for a <style> element to show where it should be broken up into text elements upon replay: `/* rr_split */`
* The new 'can record and replay style mutations' test is the principal way to the problematic scenarios, and is a detailed 'catch-all' test with many checks to cover most of the ways things can fail
* There are new tests for splitting/rebuilding the css using the rr_split marker
* The prior 'dynamic stylesheet' route is now the main route for serializing a stylesheet; dynamic stylesheet were missed out in #1533 but that case is now covered with this PR

This PR was originally extracted from #1475 so the  initial motivation was to change the approach on stringifying <style> elements to do so in a single place.  This is also the motivating factor for always serializing <style> elements via the `_cssText` attribute rather than in it's childNodes; in #1475 we will be delaying populating `_cssText` for performance and instead recorrding them as assets.

Thanks for the detailed review to  Justin Halsall <[email protected]> & Yun Feng <https://github.com/YunFeng0817>
@eoghanmurray eoghanmurray force-pushed the stylesheet-assets branch 2 times, most recently from c593165 to efaae6d Compare August 23, 2024 15:48
Copy link
Contributor

@Juice10 Juice10 Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self, we should pull these type changes into their own PR, to ease the release & maintenance of this PR. Especially where things have been moved from rrweb-snapshot to @rrweb/types

eoghanmurray and others added 19 commits October 14, 2024 16:19
…, wasn't getting rebuilt with `adaptCssForReplay`
… a `true` (ignore by default) - an empty (or maybe partial) captureAssets config where origins was unspecified was producing the error
    // presence of onAssetDetected means we should get
    // rr_captured_href (with contents promised later - i.e. using rrweb/record)

Not sure when this test started failing
… be a fixup to 'Capture <style> element css via an asset event...'
…esume that as the machines are quicker there, the tests complete before stylesheets are processed
…apshots until we receive the stylesheet assets to avoid a flash of unstyled content (fouc)
eoghanmurray and others added 9 commits October 25, 2024 18:51
…n the recording, and the ordering of asset arrival in the replayer was not regular. This change also means we only wait for stylesheets (which have a `timeout` associate with their status), which is good as other assets can be handled asyncrounously by the replayer asset manager without any negative effects (images get a placeholder image, whereas a missing stylesheet affect the entire page rendering)
… I'm about to remove the reset in favour of keeping assets around
…multi-page session).

 - instead of resetting between FullSnapshots, we instead record assets against a timestamp
 - prefer assets with timestamps subseqent to a snapshot (as if a reset happened)
 - if none can be found, find the most recent prior asset for a url

The asset manager now requires a rudimentary idea of where the replayer is at prior to applying an asset
…e types from rrweb-snapshot to @rrweb/types' but I don't know why it wasn't needed before
…that we don't delay fullsnapshot rendering awaiting for them
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants