Skip to content
Open
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
efa63f7
sharing incomplete draft to allow validation of the parsing intention
cookiecrook Feb 3, 2024
4fdab4f
minor revision on algo allowed chars
cookiecrook Feb 9, 2024
591905e
file structure inclusion
cookiecrook Mar 30, 2024
a404d22
Move the type registry out of this PR into #512 or another.
cookiecrook Mar 30, 2024
5e0bf50
update attr key regex and note
cookiecrook Mar 30, 2024
97c18ed
further clarifying parsing rules for attrs block, key/value pairs, an…
cookiecrook Jun 14, 2024
bc0b1c9
updates from first review; more external references
cookiecrook Jun 14, 2024
ab15449
commenting the subtype references in favor of addressing as new webvt…
cookiecrook Jun 14, 2024
c934597
removing the commented subtype references in favor of addressing as n…
cookiecrook Jun 14, 2024
d57f19c
review: -"video descriptions" + "descriptions"
cookiecrook Jun 17, 2024
71a249c
review comments from nigelmeggit
cookiecrook Jun 17, 2024
009ec96
Apply suggestions from code review and outreach feedback
cookiecrook Apr 17, 2025
0a074fe
incorporated all but one final comment of the review feedback
cookiecrook Apr 17, 2025
6aa01dc
clarifying case insensitive
cookiecrook Apr 18, 2025
d0e4581
correcting case insensitive
cookiecrook Apr 18, 2025
0e5dd51
reorg attr block def based on gary's feedback
cookiecrook Apr 18, 2025
b333603
bidi feedback from aphilips
cookiecrook Jul 3, 2025
d9c7fb6
m. whitespace.
cookiecrook Jul 7, 2025
6a79f25
m. whitespace.
cookiecrook Jul 7, 2025
7d54121
m. whitespace.
cookiecrook Jul 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
196 changes: 194 additions & 2 deletions index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -362,6 +362,93 @@ CSS comment (e.g. <code>/**/</code>).</p>

</div>

<h3 id=introduction-attributes-block>Attributes Block</h3>

<p><i>This section is non-normative.</i></p>

<p>WebVTT supports an Attributes block to provide additional information about the rendered text track, and to allow disambiguation of metadata tracks.</p>




<div class="example">

<p>In this example, an optional WebVTT attributes object is used to define the source language and its label in a subtitle/caption selection menu.</p>
<pre>
WEBVTT

ATTRIBUTES
kind: subtitles
lang: es-mx
label: Español

NOTE
Standard subtitles (unlike CC or SDH captions) typically
translate spoken dialog or signage, but not audible sound
effects like "dogs barking."

1
00:00:10.123 --> 00:00:15.432
¡Hola! ¿Qué tál?
</pre>

</div>


<div class="example">

<p>In this example, an optional WebVTT attributes object is used to differentiate captions from standard subtitles.</p>
<pre>
WEBVTT

ATTRIBUTES
kind: captions
lang: es-mx
label: Español (SDH)

NOTE
Captions (SDH aka Subtitles for the Deaf and Hard-of-Hearing)
typically include spoken dialog as well as important audible
sounds such as "floor boards creak", "dogs barking", or in
this case, "music".

1
00:00:10.123 --> 00:00:15.432
¡Hola! ¿Qué tál?

2
00:00:47.462 --> 00:01:04.028
[♫ música ♫]
</pre>

</div>


<div class="example">

<p>In this example, a WebVTT attributes object is used to indicate the text track cues represent audible or braille descriptions for the blind. Unlike subtitles or captions, these are not intended to be rendered visually.</p>
<pre>
WEBVTT

ATTRIBUTES
kind: descriptions
lang: en-us
label: English (AD)

NOTE
VTT-based descriptions are meant to render as text-to-speech audio or braille,
for blind or deafblind audiences, not usually as visual captions on screen.
As such, the option/label might be displayed in an audio menu or elsewhere.

1
00:00:10.123 --> 00:00:15.432
A young girl tiptoes down a dark hallway.
</pre>

</div>



<h3 id=introduction-other-features>Other caption and subtitling features</h3>

<p><i>This section is non-normative.</i></p>
Expand Down Expand Up @@ -671,11 +758,14 @@ signifies the end of the WebVTT cue.</p>

<div class="example">

<p>In this example, a talk is split into each slide being a chapter.</p>
<p>In this example, topics mentioned in a talk are provided as URLs for reference.</p>

<pre>
WEBVTT

ATTRIBUTES
kind: metadata

NOTE
Thanks to http://output.jsbin.com/mugibo

Expand Down Expand Up @@ -704,6 +794,30 @@ signifies the end of the WebVTT cue.</p>

</div>

<div class="example">

<p>In this example, a sequence of video thumbnails and their text alternative are made available for the playback UI.</p>
<pre>
WEBVTT

ATTRIBUTES
kind: metadata

00:00:01.959 --> 00:00:02.938
{
"src": "https://cdn.example.com/thumbnails.jpg#xywh=0,0,284,160",
"alt": {
"en-us": "Miguel crosses the marigold bridge to the land of the dead.",
"es-mx": "Miguel cruza el puente marigold hacia la tierra de los muertos."
}
}
</pre>
</div>

<p class="note">The Timed Text Working Group is discussing a registry for metadata <code>type</code>
values, such as <code>type: video-thumbnails</code> or <code>type: video-flash-avoidance</code>.
See WebVTT issues <a href="https://github.com/w3c/webvtt/issues/511">#511</a> and <a href="https://github.com/w3c/webvtt/issues/512">#512</a> for more info.</p>


<h2 id=conformance>Conformance</h2>

Expand Down Expand Up @@ -1474,6 +1588,9 @@ with the <a>MIME type</a> <code>text/vtt</code>. [[!RFC3629]]</p>
<li>Two or more <a lt="WebVTT line terminator">WebVTT line terminators</a> to terminate the line
with the file magic and separate it from the rest of the body.</li>

<li>Zero or one <a lt="WebVTT attributes block">WebVTT attributes block</a> followed by one or
more <a lt="WebVTT line terminator">WebVTT line terminators</a>.</li>

<li>Zero or more <a lt="WebVTT region definition block">WebVTT region definition blocks</a>, <a
lt="WebVTT style block">WebVTT style blocks</a> and <a lt="WebVTT comment block">WebVTT comment
blocks</a> separated from each other by one or more <a lt="WebVTT line terminator">WebVTT line
Expand Down Expand Up @@ -1650,6 +1767,53 @@ SIGN).</p>

<p>When interpreted as a number, a <a>WebVTT percentage</a> must be in the range 0..100.</p>

<p>A <dfn>WebVTT attributes block</dfn> consists of the following components, in the given order:</p>
<ol>
<li>The string "<code>ATTRIBUTES</code>".</li>
<li>Zero or more U+0020 SPACE or U+0009 CHARACTER TABULATION (tab) characters.</li>
<li>A <a>WebVTT line terminator</a>.</li>
<li>A <a>WebVTT attributes body block</a>.</li>
<li>A <a>WebVTT line terminator</a>.</li>
</ol>

<p>A <dfn>WebVTT attributes body block</dfn> consists of the following components, in the given order:</p>
<ol>
<li>Zero or more key/value pairs, parsed in the given order:
<ol>
<li>A <dfn>WebVTT attribute key</dfn> consisting of: (<code>[A-Za-z_][0-9A_Za-z_]*</code>)
<ol>
<li>Any one of the following:
<ul>
<li>Any <a href="https://infra.spec.whatwg.org/#ascii-alpha">ASCII Alpha</a> character</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why so restrictive? If we stick with this then I'd be interested to know what the Internationalisation WG thinks of it.

<li>U+005F LOW LINE ("_" underscore)</li>
</ul>
</li>
<li>Optionally followed by zero or more of the following:
<ul>
<li>Any <a href="https://infra.spec.whatwg.org/#ascii-alphanumeric">ASCII Alphanumeric</a> character</li>
<li>U+005F LOW LINE ("_" underscore)</li>
</ul>
</li>
</ol>
</li>
<li>A single U+003A COLON character ("<code>:</code>").</li>
<li>Zero or one U+0020 SPACE or U+0009 CHARACTER TABULATION (tab) characters.</li>
<li>
A <dfn>WebVTT attribute value</dfn> consisting of any sequence of zero or more characters other than the following:
<ul>
<li>unescaped LINE FEED (LF) characters (U+000A),</li>
<li>unescaped CARRIAGE RETURN (CR) characters (U+000D),</li>
<li>unescaped bi-directional formatting characters (U+202B, U+202C, U+202D, U+202E, U+2066, U++2067, U++2068, U+2069, U+200E, U+200F, U+061C), or</li>
Comment on lines +1804 to +1806
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No escaping mechanism appears to be defined.

<li>the substring "<code>--></code>" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).</li>
</ul>
</li>
<li>A <a>WebVTT line terminator</a>.</li>
</ol>
</li>
</ol>
<p>Process the <a>WebVTT attributes body block</a> key/value pairs according to the <a>WebVTT rules for parsing attribute key/value pairs</a>.</p>


<p>A <dfn>WebVTT comment block</dfn> consists of the following components, in the given order:</p>

<ol>
Expand Down Expand Up @@ -1687,7 +1851,7 @@ separated from the next by a <a>WebVTT line terminator</a>. (In other words, any
have two consecutive <a lt="WebVTT line terminator">WebVTT line terminators</a> and does not start
or end with a <a>WebVTT line terminator</a>.)</p>

<p><a>WebVTT metadata text</a> cues are only useful for scripted applications (e.g. using the
<p><a>WebVTT metadata text</a> cues were originally intended for scripted applications (e.g. using the
<code>metadata</code> <a>text track kind</a> in a HTML <a>text track</a>).</p>


Expand Down Expand Up @@ -4130,6 +4294,34 @@ follows:</p>
</ol>


<h3 id=rules-for-parsing-attr-key-values algorithm>WebVTT rules for parsing attribute key/value pairs</h3>
<p>The <dfn>WebVTT rules for parsing attribute key/value pairs</dfn> consist of the following algorithm.</p>

<ol algorithm="WebVTT attributes block parsing">
<li>Let |input| be the list of key/value pairs from a <a>WebVTT attributes block</a>.</li>
<li>
How the attribute is processed depends on its key name, as follows:
<dl>

<dt>If the key name is "<code>kind</code>" (<a href="https://infra.spec.whatwg.org/#ascii-case-insensitive">ASCII case-insensitive</a>)</dt>
<dd>Process the value as <a href="https://html.spec.whatwg.org/multipage/media.html#attr-track-kind">the kind attribute</a> of a track element according to the HTML Standard.</dd>

<dt>If the key name is "<code>lang</code>" (<a href="https://infra.spec.whatwg.org/#ascii-case-insensitive">ASCII case-insensitive</a>)</dt>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this lang and not language, to match #485 (comment) ?

<dd>Process the value as <a href="https://html.spec.whatwg.org/multipage/media.html#attr-track-srclang">the srclang attribute</a> of a track element according to the HTML Standard.</dd>

<dt>If the key name is "<code>label</code>" (<a href="https://infra.spec.whatwg.org/#ascii-case-insensitive">ASCII case-insensitive</a>)</dt>
<dd>Process the value as <a href="https://html.spec.whatwg.org/multipage/media.html#attr-track-label">the label attribute</a> of a track element according to the HTML Standard.</dd>
Comment on lines +4306 to +4313
Copy link
Contributor

@nigelmegitt nigelmegitt Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this says "Process the value as a ... according to the HTML standard", what is meant? Is it only processing it as a syntactic entity, or is there an expectation that the contents of the file will be handled differently?

What is supposed to happen if a <track> element has one or more of these keys as attributes, and their values differ from the values in a referenced WebVTT file? Is that an error? Which takes precedence?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a layering violation. WebVTT should make the data available and HTML should say what to do with it in its track processing model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it could be, depending on what you understand by the phrase "Process the value as".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebVTT should make the data available and HTML should say what to do with it in its track processing model.

I believe that was the intent.

What is supposed to happen if a <track> element has one or more of these keys as attributes, and their values differ from the values in a referenced WebVTT file? Is that an error? Which takes precedence?

I would expect the <track> element attribute to win in this case. But it's probably up to how it'll get defined in an HTML update.

"Process the value as" is probably not clear enough as to what we expect to happen here. My expectations are this:

  • link label, lang, and kind to the definitions in HTML for label, srclang, and kind
  • have these values available in such a way that HTML can use them directly without extra processing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're expecting changes to HTML to inspect the referenced WebVTT files and use their values to populate the kind, label and srclang attributes then I think we need confirmation that that's the accepted direction of travel before merging this. That's not at all obviously a good thing to me, because it might imply that all the linked WebVTT resources need to be fetched and parsed before any video can play, even if they're unused.

Otherwise I'd say that in linking the definitions we would only be specifying the permitted values for each of the listed keys, and not saying anything about how they're used elsewhere.

Would it be a layering violation if we said something like "If a WebVTT file referenced by a <track> element has different values for any of [keys] to those specified in the <track> element, then $consequence"? My thinking is that could be something that the WebVTT processing model would reasonably define.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that belongs in HTML.

Currently, HTML allows in-band text tracks to have format-provided kind, label, and language:

Set the new text track's kind, label, and language based on the semantics of the relevant data, as defined by the relevant specification. If there is no label in that data, then the label must be set to the empty string.

https://html.spec.whatwg.org/#sourcing-in-band-text-tracks

And https://dev.w3.org/html5/html-sourcing-inband-tracks/ specifies how this data is provided for some formats.

For out-of-band text tracks (i.e. using the HTML track element), HTML needs changes to allow format-specified metadata.

https://html.spec.whatwg.org/#sourcing-out-of-band-text-tracks

I think getting the integration with HTML right is important.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're expecting changes to HTML to inspect the referenced WebVTT files and use their values to populate the kind, label and srclang attributes

I would not expect that to happen, but the format-provided values would populate the internal concepts and be readable from the TextTrack object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this @zcorpan - what is it that you would not expect to happen?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect the content attribute values, which can be read with e.g. getAttribute(), to not change based on the contents of the WebVTT file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've filed whatwg/html#11665


<dt>Otherwise</dt>
<dd>Ignore the key/value pair.</dd>

</dl>
</li>
</ol>

<p class="note">These keys are case-insensitive to allow compatibility with large video distributors <!-- namely YouTube --> already using this pattern in production.</p>


<h2 id=rendering>Rendering</h2>

<p class="note">This section describes in some detail how to visually render <a>WebVTT caption or
Expand Down