speculation-rules.bs

<pre class="metadata">
Title: Speculation Rules
Shortname: speculation-rules
Group: WICG
Status: CG-DRAFT
Repository: WICG/nav-speculation
URL: https://wicg.github.io/nav-speculation/speculation-rules.html
Level: 1
Editor: Jeremy Roman, Google https://www.google.com/, jbroman@chromium.org
Abstract: A flexible syntax for defining what outgoing links can be prepared speculatively before navigation.
Markup Shorthands: css no, markdown yes
Assume Explicit For: yes
Complain About: accidental-2119 yes, missing-example-ids yes
Indent: 2
Boilerplate: omit conformance
</pre>
<pre class="link-defaults">
spec:html; type:element; text:link
spec:html; type:element; text:script
</pre>
<pre class="anchors">
spec: html; urlPrefix: https://html.spec.whatwg.org/multipage/
  type: dfn
    urlPrefix: webappapis.html
      text: script; url: concept-script
spec: nav-speculation; urlPrefix: prefetch.html
  type: dfn
    text: prefetch; url: prefetch
    text: cross-origin prefetch IP anonymization policy; url: cross-origin-prefetch-ip-anonymization-policy
    text: origin; for: cross-origin prefetch IP anonymization policy; url: cross-origin-prefetch-ip-anonymization-policy-origin
spec: nav-speculation; urlPrefix: prerender.html
  type: dfn
    text: create a prerendering browsing context; url: create-a-prerendering-browsing-context
</pre>
<style>
/* domintro from https://resources.whatwg.org/standard.css */
.domintro {
  position: relative;
  color: green;
  background: #DDFFDD;
  margin: 2.5em 0 2em 0;
  padding: 1.5em 1em 0.5em 2em;
}

.domintro dt, .domintro dt * {
  color: black;
  font-size: inherit;
}
.domintro dd {
  margin: 0.5em 0 1em 2em; padding: 0;
}
.domintro dd p {
  margin: 0.5em 0;
}
.domintro::before {
  content: 'For web developers (non-normative)';
  background: green;
  color: white;
  padding: 0.15em 0.25em;
  font-style: normal;
  position: absolute;
  top: -0.8em;
  left: -0.8em;
}
</style>

<h2 id="speculation-rules">Speculation rules</h2>

<h3 id="speculation-rules-dfns">Definitions</h3>

A <dfn>speculation rule</dfn> is a [=struct=] with the following [=struct/items=]:
* <dfn for="speculation rule">URLs</dfn>, an [=ordered set=] of [=URLs=]
* <dfn for="speculation rule">requirements</dfn>, an [=ordered set=] of [=strings=]

The only valid string for [=speculation rule/requirements=] to contain is "`anonymous-client-ip-when-cross-origin`".

A <dfn>speculation rule set</dfn> is a [=struct=] with the following [=struct/items=]:
* <dfn for="speculation rule set">prefetch rules</dfn>, a [=list=] of [=speculation rules=]
* <dfn for="speculation rule set">prerender rules</dfn>, a [=list=] of [=speculation rules=]

<h3 id="speculation-rules-script">The <{script}> element</h3>

<em>Note</em>: This section contains modifications to the corresponding section of [[HTML]].

To process speculation rules consistently with the existing script types, we make the following changes:

* Add "`speculationrules`" to the list of valid values for <a spec=html>the script's type</a>.

* Rename [=the script's script=] to <dfn>the script's result</dfn>, which can be either a <a spec="html">script</a> or a [=speculation rule set=].

The following algorithms are updated accordingly:

* [=Prepare a script=]: see [[#speculation-rules-prepare-a-script-patch]].
* <a spec=html>Execute a script block</a>: Add the following case to the switch on <a spec=html>the script's type</a>:
  <dl>
    <dt>"`speculationrules`"</dt>
    <dd>
      1. [=Assert=]: Never reached.
    </dd>
  </dl>

<p class="issue">We should consider whether we also want to make this execute even if scripting is disabled.</p>

<p class="issue">We should also incorporate the case where a {{HTMLScriptElement/src}} attribute is set.</p>

<p class="issue">We could fire {{HTMLElement/error}} and {{HTMLElement/load}} events if we wanted to.</p>

* In {{HTMLScriptElement/supports(type)}} method steps, before

  > 3. Return false.

  add the following step:

  > 3. If type is "`speculationrules`", then return true.

<h3 id="speculation-rules-prepare-a-script-patch">Prepare a script</h3>

Inside the [=prepare a script=] algorithm we make the following changes:

* Insert the following step as the second-last sub-step under "Determine the script's type as follows:":
  * If the script block's type string is an [=ASCII case-insensitive=] match for the string "`speculationrules`", <a spec=html>the script's type</a> is "`speculationrules`".

* Insert the following case in the switch on <a spec=html>the script's type</a> within the step which begins "If the element does not have a {{HTMLScriptElement/src}} content attribute..."
  <dl>
    <dt>"`speculationrules`"</dt>
    <dd>
      1. Let |result| be the result of [=parsing speculation rules=] given source text and base URL.

      1. Set [=the script's result=] to |result|.

      1. <a spec=html>The script is ready</a>.
    </dd>
  </dl>

* Insert the following case to the switch in the subsequent step beginning "Then, follow the first of the following options...." after the cases which apply only to "`classic`" and "`module`" scripts:
  <dl>
    <dt>If <a spec=html>the script's type</a> is "`speculationrules`"</dt>
    <dd>
      1. When <a spec=html>the script is ready</a>, run the following steps:

        1. If [=the script's result=] is not null, [=list/append=] it to the element's [=Node/node document=]'s [=document/list of speculation rule sets=].
    </dd>
  </dl>


<h3 id="speculation-rules-parsing">Parsing</h3>

<p class="note">
  The general principle here is to allow the existence of directives which are not understood, but not to accept into the rule set a rule which the user agent does not fully understand.
  This reduces the risk of unintended activity by user agents which are unaware of most recently added directives which might limit the scope of a rule.

<div algorithm="parse speculation rules">
  To <dfn>parse speculation rules</dfn> given a [=string=] |input| and a [=URL=] |baseURL|, perform the following steps. They return a [=speculation rule set=] or null.

  1. Let |parsed| be the result of [=parsing a JSON string to an Infra value=] given |input|.
  1. If |parsed| is not a [=map=], then return null.
  1. Let |result| be an empty [=speculation rule set=].
  1. If |parsed|["`prefetch`"] [=map/exists=] and is a [=list=], then [=list/for each=] |prefetchRule| of |parsed|["`prefetch`"]:
    1. If |prefetchRule| is not a [=map=], then [=iteration/continue=].
    1. Let |rule| be the result of [=parsing a speculation rule=] given |prefetchRule| and |baseURL|.
    1. If |rule| is null, then [=iteration/continue=].
    1. [=list/Append=] |rule| to |result|'s [=speculation rule set/prefetch rules=].
  1. If |parsed|["`prerender`"] [=map/exists=] and is a [=list=], then [=list/for each=] |prerenderRule| of |parsed|["`prerender`"]:
    1. If |prerenderRule| is not a [=map=], then [=iteration/continue=].
    1. Let |rule| be the result of [=parsing a speculation rule=] given |prerenderRule| and |baseURL|.
    1. If |rule| is null, then [=iteration/continue=].
    1. [=list/Append=] |rule| to |result|'s [=speculation rule set/prerender rules=].
  1. Return |result|.
</div>

<div algorithm="parse a speculation rule">
  To <dfn>parse a speculation rule</dfn> given a [=map=] |input| and a [=URL=] |baseURL|, perform the following steps. They return a [=speculation rule=] or null.

  1. If |input| has any [=map/key=] other than "`source`", "`urls`", and "`requires`", then return null.
  1. If |input|["`source`"] does not [=map/exist=] or is not the [=string=] "`list`", then return null.
  1. Let |urls| be an empty [=list=].
  1. If |input|["`urls`"] does not [=map/exist=], is not a [=list=], or has any element which is not a [=string=], then return null.
  1. [=list/For each=] |urlString| of |input|["`urls`"]:
    1. Let |parsedURL| be the result of [=basic URL parser|parsing=] |urlString| with |baseURL|.
    1. If |parsedURL| is failure, then [=iteration/continue=].
    1. If |parsedURL|'s [=url/scheme=] is not an [=HTTP(S) scheme=], then [=iteration/continue=].
    1. [=list/Append=] |parsedURL| to |urls|.
  1. Let |requirements| be an empty [=ordered set=].
  1. If |input|["`requires`"] [=map/exists=], but is not a [=list=], then return null.
  1. [=list/For each=] |requirement| of |input|["`requires`"]:
    1. If |requirement| is not the [=string=] "`anonymous-client-ip-when-cross-origin`", then return null.
    1. [=set/Append=] |requirement| to |requirements|.
  1. Return a [=speculation rule=] with [=speculation rule/URLs=] |urls| and [=speculation rule/requirements=] |requirements|.
</div>

<h3 id="speculation-rules-processing">Processing model</h3>

A [=document=] has a <dfn for=document export>list of speculation rule sets</dfn>, which is an initially empty [=list=].

<!-- TODO(domfarolino): Get rid of the `data-link-type="interface"` once we fix the dfn in HTML. -->
Periodically, for any [=document=] |document|, the user agent may [=queue a global task=] on the <a data-link-type="interface">DOM manipulation task source</a> with |document|'s [=relevant global object=] to [=consider speculation=] for |document|.

<p class="note">
  The user agent will likely do this after the insertion of new speculation rules, or when resources are idle and available.

<div algorithm="consider speculation">
  To <dfn>consider speculation</dfn> for a [=document=] |document|:

  1. If |document| is not [=Document/fully active=], then return.
     <p class="issue">It's likely that we should also handle prerendered and back-forward cached documents.
  1. For each |ruleSet| of |document|'s [=document/list of speculation rule sets=]:
    1. [=list/For each=] |rule| of |ruleSet|'s [=speculation rule set/prefetch rules=]:
      1. Let |anonymizationPolicy| be null.
      1. If |rule|'s [=speculation rule/requirements=] [=set/contains=] "`anonymous-client-ip-when-cross-origin`", set |anonymizationPolicy| to a [=cross-origin prefetch IP anonymization policy=] whose [=cross-origin prefetch IP anonymization policy/origin=] is |document|'s [=Document/origin=].
      1. [=list/For each=] |url| of |rule|'s [=speculation rule/URLs=]:
        1. The user agent may [=prefetch=] |url| given |document|, |url|, "`strict-origin-when-cross-origin`" and |anonymizationPolicy|.
    1. [=list/For each=] |rule| of |ruleSet|'s [=speculation rule set/prerender rules=]:
      1. [=list/For each=] |url| of |rule|'s [=speculation rule/URLs=]:
        1. The user agent may [=create a prerendering browsing context=] given |url|, "`strict-origin-when-cross-origin`" and |document|.
</div>

<p class="issue">
  We should also notice removals and consider cancelling speculated actions.
</p>

<h2 id="security-considerations">Security considerations</h2>

<h3 id="security-csrf">Cross-site request forgery</h3>

This specification allows documents to cause HTTP requests to be issued.

When any supported action acts on a URL which is [=same origin=] to the document, then this does not constitute a risk of cross-site request forgery, since the request uses only the credentials available to the document.

Otherwise, requests are always issued without using any previously existing [=credentials=]. This limits the ambient authority available to any potentially forged request, and such requests can already be made through [[FETCH]], a subresource or frame, or various other means. Site operators are therefore already well-advised to use CSRF tokens or other mitigations for this threat.

<h3 id="security-xss">Cross-site scripting</h3>

This specification causes activity in response to content found in the document, so it is worth considering the options open to an attacker able to inject unescaped HTML.

Such an attacker is otherwise able to inject JavaScript, frames or other elements. The activity possible with this specification (requesting fetches etc) is generally less dangerous than arbitrary script execution, and comparable to other elements. The same mitigations available to other features also apply here. In particular, the [[CSP]] `script-src` directive applies to the parsing of the speculation rules and the `prefetch-src` directive applies to prefetch requests arising from the rules.

<h3 id="type-confusion">Type confusion</h3>

In the case of speculation rules in an inline `<script>`, an application which erroneously parsed speculation rules as a JavaScript script (though user agents are instructed not to execute scripts who "`type`" is unrecognized) would either interpret it as the empty block `{}` or produce a syntax error, since the U+003A COLON (`:`) after the first key is invalid JavaScript. In neither case would such an application execute harmful behavior.

Since the parsing behavior of the `<script>` element has long been part of HTML, any modern HTML parser would not construct any non-text children of the element. There is thus a low risk of other text hidden inside a `<script>` element with `type="speculationrules"` which is parsed as part of the script content by compliant HTML implementations but as HTML tags by others.

Authors should, however, still escape any potentially attacker-controlled content inserted into speculation rules. In particular, it may be necessary to escape JSON syntax as well as, if the speculation rules are in an inline `<script>` tag, the closing `</script>` tag. [[CSP]] is a useful additional mitigation for vulnerabilities of this type.

<div class="issue">Expand this section once externally loaded (via "`src`") speculation rules are specified.</div>

<h3 id="security-ip-anonymization">IP anonymization</h3>

This specification allows authors to request prefetch traffic using IP anonymization technology provided by the user agent. The details of this technology are not a part of this specification; nonetheless some general principles apply.

To the extent IP anonymization is implemented using a proxy service, it is advisable to minimize the information available to the service operator and other entities on the network path. This likely involves, at a minimum, the use of [[TLS]] for the connection.

Site operators should be aware that, similar to virtual private network (VPN) technology, the client IP address seen by the HTTP server may not exactly correspond to the user's actual network provider or location, and a traffic for multiple distinct subscribers may originate from a single client IP address. This may affect site operators' security and abuse prevention measures. IP anonymization measures may make an effort to use an egress IP address which has a similar geolocation or is located in the same jurisdiction as the user, but any such behavior is particular to the user agent and not guaranteed by this specification.

<h2 id="privacy-considerations">Privacy considerations</h2>

<h3 id="privacy-heuristics">Heuristics</h3>

Because the candidate prefetches and other actions are not required, the user agent can use heuristics to determine which actions would be best to execute. Because it may be observable to the document whether actions were executed, user agents must take care to protect privacy when making such decisions — for instance by only using information which is already available to the origin. If these heuristics depend on any persistent state, that state must be erased whenever the user erases other site data. If the user agent automatically clears other site data from time to time, it must erase such persistent state at the same time.

<div class="note">
  The use of <em>origin</em> here instead of <em>site</em> here is intentional. Origins generally form the basis for the web's security boundary. Though same-site origins are generally allowed to coordinate if they wish, origins are generally not allowed access to data from other origins, even same-site ones.
</div>

Examples of inputs which would be already known to the document:
* author-supplied scores (if future version of this specification allows specifying them)
* order of appearance in the document
* whether the link is in the viewport
* whether the cursor is near the link
* rendered size of the link

Examples of persistent data related to the origin (which the origin could have gathered itself) but which must be erased according to user intent:
* whether the user has clicked this or similar links on this document or other documents on the same origin

Examples of device information which may be valuable in deciding whether prefetching is appropriate, but which must be considered as part of the user agent's overall privacy posture because it may make the user more identifiable across origins:
* coarse device class (CPU, memory)
* coarse battery level
* whether the network connection is known to be metered

<h3 id="privacy-intent">Intent</h3>

While efforts have been made to minimize the privacy impact of prefetching, some users may nonetheless prefer that prefetching not occur, even though this may make loading slower. User agents are encouraged to provide a setting to disable prefetching features to accommodate such users.

<h3 id="privacy-partitioning">Partitioning</h3>

Some user agents <a href="https://privacycg.github.io/storage-partitioning/">partition storage</a> according to the site or origin of the top-level document. In order for prefetching and prerendering to be useful, it is therefore essential that prefetching or prerendering of a document either occur in the partition in which the navigation would occur (e.g., for a same-origin URL) or in an isolated partition, so as to ensure that prefetching does not become a mechanism for bypassing the partitioning scheme.

<div class="issue">Expand this section once more detail on prefetch and prerender partitioning mechanism is specified.</div>

<h3 id="privacy-identity-joining">Identity joining</h3>

This specification describes a mechanism through which HTTP requests for later top-level navigation (in the case of prefetching) can be made without a user gesture. It is natural to ask whether it is possible for two coordinating sites to connect user identities.

Since existing [=credentials=] for the destination origin are not sent (assuming it is not [=same origin=] with the referrer), that site is limited in its ability to identify the user before navigation in a similar way to if the referrer site had simply used [[FETCH]] to make an uncredentialed request. Upon navigation, this becomes similar to ordinary navigation (e.g., by clicking a link that was not prefetched).

To the extent that user agents attempt to mitigate identity joining for ordinary fetches and navigations, they can apply similar mitigations to prefetched navigations.