Skip to content

Conversation

cookiecrook
Copy link

@cookiecrook cookiecrook commented Feb 3, 2024

Closes #511

Add new ATTRIBUTES block parsing rules and examples as discussed in #511.

This is also a prerequisite for:

…and its duplicate in the WHATWG repo:
whatwg/html#11333


Preview | Diff

@cookiecrook cookiecrook self-assigned this Mar 29, 2024
@cookiecrook cookiecrook marked this pull request as ready for review June 14, 2024 00:37
@cookiecrook
Copy link
Author

cookiecrook commented Jun 14, 2024

Marking as Ready for Review despite there being some TODOs remaining in the diff. Feedback requested from @gkatsev, @silviapfeiffer, @nigelmegitt, @eric-carlson, @chrisn, @bdougherty, et al.

@cookiecrook
Copy link
Author

I think this PR is merge-ready now once a TTWG reviewer approves. The previously included TODO comments are now removed from this PR in favor of addressing that follow-on work in:

Copy link
Contributor

@nigelmegitt nigelmegitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would benefit from more clarity (i.e. more text) about the intended use of ATTRIBUTES and its relationship to attributes on the HTML <track> element, and any other potential usage context.

index.bs Outdated

<div class="example">

<p>In this example, an optional WebVTT attributes object is used to differentiate captions from standard subtitles.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling here is that the number of examples enumerating different values of kind is unnecessary given that this specification is not defining that attribute's value set. Better to have one example and, somewhere, link the value set to its canonical definition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand... Are you suggesting the value parsing algo cross reference the HTML Infra values in track[kind]?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, upon re-reading, it sounds like you're suggesting the examples be removed.

FWIW, as a reader of the spec (this was my first PR to VTT), I found the pre-existing examples helpful. Perhaps others will too? The ATTRIBUTES block is defined here, so seeing how it's used with different key/value pairs seems useful to me, even if those values are defined by HTML.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one of each:

  • subtitle example
  • caption example
  • description example
  • metadata example

I'd prefer to leave all 4 examples, but perhaps we can compromise on just removing the subtitle example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think there are too many but it's not a major complaint; as you say, many folk find examples an easy way into understanding the specification.

Also I just noticed that Example 14 is appearing in the Chapters section, but its about descriptions, so it seems to be misplaced.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to like a lot of examples.

Given how similar the subtitles and captions examples are, maybe we only keep one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gkatsev Thought about this some more, and it may help to keep them? Most people don't know the difference between captions and subtitles, so the two examples may provide that clarity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm moving the additional examples to a new intro informative section on Attributes in the Introduction, which looks much better than tacking on additional "other" examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most people don't know the difference between captions and subtitles

[aside]

A significant number of people don't subscribe to the distinction made in the web context between them, and actively use different definitions for the terms. My position on this is that the web usage attempts to bring clarity, but in doing so only brings more problems. IMO it would be better to be explicit about the type of subtitles/captions rather than keep flogging this!

@nigelmegitt
Copy link
Contributor

Needs a review from the Editor, @gkatsev .

@nigelmegitt
Copy link
Contributor

nigelmegitt commented Sep 23, 2024

See also another operational practice described for doing this at #485 (comment) in which YouTube is doing it in the Header section. We should probably make sure that whatever we specify here aligns with practice, or if not, that folk using that alternative approach are happy to change.

Update: @gkatsev had the action to draft a PR to resolve #485, and that looks to me as though it would overlap strongly with this PR.

Copy link
Collaborator

@gkatsev gkatsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this @cookiecrook and for your patience!

index.bs Outdated

<div class="example">

<p>In this example, an optional WebVTT attributes object is used to differentiate captions from standard subtitles.</p>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to like a lot of examples.

Given how similar the subtitles and captions examples are, maybe we only keep one?

@gkatsev
Copy link
Collaborator

gkatsev commented Sep 24, 2024

See also another operational practice described for doing this at #485 (comment) in which YouTube is doing it in the Header section. We should probably make sure that whatever we specify here aligns with practice, or if not, that folk using that alternative approach are happy to change.

What YouTube has definitely seems to align with it. Would be great if we can find someone from there to comment on this, if they could potentially move to the new block when it exists.

Update: @gkatsev had the action to draft a PR to resolve #485, and that looks to me as though it would overlap strongly with this PR.

I think ultimately, particularly, with HLS's usage of the header, we do still want the header as "point of extensibility" there. But, we may want to discourage new uses of it, since the majority of these use cases could use the ATTRIBUTES block. As an example, the ECMAScript spec added the __proto__ to the spec to get it documented but marked it as legacy: https://262.ecma-international.org/15.0/index.html#sec-object.prototype.__proto__ (I'm not aware of something specific like that in HTML or w3c specs, but I'm also not sure that there's anything against doing so).

@cookiecrook
Copy link
Author

@gkatsev wrote:

What YouTube has definitely seems to align with it. Would be great if we can find someone from there to comment on this, if they could potentially move to the new block when it exists.

Yes, I have taken that as an action.

@css-meeting-bot
Copy link
Member

The Timed Text Working Group just discussed New ATTRIBUTES block w3c/webvtt#523, and agreed to the following:

  • SUMMARY: @cookiecrook to raise issue on HTML about track attribute precedence
The full IRC log of that discussion <nigel> Subtopic: New ATTRIBUTES block #523
<nigel> github: https://github.com//pull/523
<nigel> jcraig: Need to assess, if there's a mismatch, which one wins.
<nigel> .. I don't have a strong opinion.
<nigel> .. Main reason we want it in the VTT is sometimes there's no HTML as an intermediary,
<nigel> .. so the track information is missing.
<nigel> gkatsev: That makes sense, my question would be would this also allow a change in HTML
<nigel> .. if we're adding precedence to those attributes.
<nigel> jcraig: I would expect that whatever we land on, we could have the same thing reflected on the track element
<nigel> eric_carlson: We also would want to add some text to the spec describing the rules of precedence,
<nigel> .. whichever way we go.
<nigel> Nigel: How are you going to work out which way to go?
<nigel> eric_carlson: We'll put a stick in the ground and we'll be right.
<nigel> gkatsev: My assumption is the track element takes precedence
<nigel> eric_carlson: That would be right, it would override what's in the file.
<nigel> gkatsev: Can still have the precedence rules in the VTT spec but would also need them in HTML
<nigel> eric_carlson: HTML doesn't say anything about how to process the contents of the VTT file,
<nigel> .. so maybe no change needed.
<nigel> jcraig: The other editorial suggestions seem fine to me.
<nigel> .. There was a section question, was just my unfamiliarity with bikeshed.
<nigel> gkatsev: Should we open an HTML issue now around this to get the conversation going.
<nigel> jcraig: I'll take that as an action and write it into PR 523 so I don't forget it.
<nigel> pal: On that last point, if you run into any issues, I can help, please don't hesitate to ask.
<nigel> jcraig: You mean pushback on HTML?
<nigel> pal: Sure, ultimately the question is who writes those tests.
<nigel> .. As Nigel pointed out there's a large body of tests to port to WPT.
<nigel> .. With reference renderers once we've overcome the issue of should we have that at all in WPT,
<nigel> .. I'm confident that we'll have the resources to make that happen.
<nigel> group confusion caused because Pierre misheard!
<nigel> gkatsev: With that we can, unless there's anything specific to WebVTT right now we can end
<nigel> SUMMARY: @cookiecrook to raise issue on HTML about track attribute precedence

@cookiecrook
Copy link
Author

I've also taken an action to file an issue to HTML to propose any reflected attributes from the VTT to HTML Track content attributes.

@cookiecrook
Copy link
Author

cookiecrook commented Jul 7, 2025

Given the age of this PR, there were some merge conflict discrepancies in the branch that I believe I have resolved now.

@eric-carlson
Copy link

This looks good to me. @gkatsev, @nigelmegitt, do you have any further concerns or can we get this merged and move forward?

@nigelmegitt
Copy link
Contributor

Sorry, a lot has happened here and I haven't had time to follow along. Looking at it now, does it fix/align with/not overlap with/not clash with #485 or is there an additional PR needed? I'm no longer sure of the scope of what this is addressing, since #485 is not mentioned in the pull request description but has clearly been at least taken into account in edits made since I last looked.

@nigelmegitt nigelmegitt dismissed their stale review July 17, 2025 16:38

Significant changes have happened since my previous review, which I've not yet had time to re-review. Dismissing my previous review.

@silviapfeiffer
Copy link
Member

FWIW I think this is long overdue and the case-insensitive approach will match with the YouTube implementation, so I am all for merging it.

@cookiecrook
Copy link
Author

cookiecrook commented Jul 23, 2025

@nigelmegitt wrote:

Looking at it now, does it fix/align with/not overlap with/not clash with #485 or is there an additional PR needed?

No additional PR needed, as far as I can tell. I have suggested #485 be closed as a [forward] duplicate of #511, and that whatwg/html#11333 be closed as a duplicate of #512.

I'm no longer sure of the scope of what this is addressing, since #485 is not mentioned in the pull request description but has clearly been at least taken into account in edits made since I last looked.

Thanks for bringing it to my attention. I was not previously aware of #485 or @yashrajbharti's issue whatwg/html#11333, so "taken into account" was merely serendipitous, but I do believe it accounts for both #485 as well as all the prerequisites needed by #512 and its duplicate whatwg/html#11333

If there are no further review comments on this one, please merge and I'll start the follow-on work for:

@gkatsev
Copy link
Collaborator

gkatsev commented Sep 12, 2025

I don't think that this PR fully resolves #485. Particularly, given the ubiquitous usage of the old header concept in HLS for the timestamp map, WebVTT should re-define the old headers purely as an extension point with no other internal usage; as Silvia mentioned in #485 (comment).

@gkatsev
Copy link
Collaborator

gkatsev commented Sep 12, 2025

Thanks for your patience. This looks good to me!

<ol>
<li>Any one of the following:
<ul>
<li>Any <a href="https://infra.spec.whatwg.org/#ascii-alpha">ASCII Alpha</a> character</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why so restrictive? If we stick with this then I'd be interested to know what the Internationalisation WG thinks of it.

Comment on lines +1804 to +1806
<li>unescaped LINE FEED (LF) characters (U+000A),</li>
<li>unescaped CARRIAGE RETURN (CR) characters (U+000D),</li>
<li>unescaped bi-directional formatting characters (U+202B, U+202C, U+202D, U+202E, U+2066, U++2067, U++2068, U+2069, U+200E, U+200F, U+061C), or</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No escaping mechanism appears to be defined.

Comment on lines +4306 to +4313
<dt>If the key name is "<code>kind</code>" (<a href="https://infra.spec.whatwg.org/#ascii-case-insensitive">ASCII case-insensitive</a>)</dt>
<dd>Process the value as <a href="https://html.spec.whatwg.org/multipage/media.html#attr-track-kind">the kind attribute</a> of a track element according to the HTML Standard.</dd>

<dt>If the key name is "<code>lang</code>" (<a href="https://infra.spec.whatwg.org/#ascii-case-insensitive">ASCII case-insensitive</a>)</dt>
<dd>Process the value as <a href="https://html.spec.whatwg.org/multipage/media.html#attr-track-srclang">the srclang attribute</a> of a track element according to the HTML Standard.</dd>

<dt>If the key name is "<code>label</code>" (<a href="https://infra.spec.whatwg.org/#ascii-case-insensitive">ASCII case-insensitive</a>)</dt>
<dd>Process the value as <a href="https://html.spec.whatwg.org/multipage/media.html#attr-track-label">the label attribute</a> of a track element according to the HTML Standard.</dd>
Copy link
Contributor

@nigelmegitt nigelmegitt Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this says "Process the value as a ... according to the HTML standard", what is meant? Is it only processing it as a syntactic entity, or is there an expectation that the contents of the file will be handled differently?

What is supposed to happen if a <track> element has one or more of these keys as attributes, and their values differ from the values in a referenced WebVTT file? Is that an error? Which takes precedence?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a layering violation. WebVTT should make the data available and HTML should say what to do with it in its track processing model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it could be, depending on what you understand by the phrase "Process the value as".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebVTT should make the data available and HTML should say what to do with it in its track processing model.

I believe that was the intent.

What is supposed to happen if a <track> element has one or more of these keys as attributes, and their values differ from the values in a referenced WebVTT file? Is that an error? Which takes precedence?

I would expect the <track> element attribute to win in this case. But it's probably up to how it'll get defined in an HTML update.

"Process the value as" is probably not clear enough as to what we expect to happen here. My expectations are this:

  • link label, lang, and kind to the definitions in HTML for label, srclang, and kind
  • have these values available in such a way that HTML can use them directly without extra processing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're expecting changes to HTML to inspect the referenced WebVTT files and use their values to populate the kind, label and srclang attributes then I think we need confirmation that that's the accepted direction of travel before merging this. That's not at all obviously a good thing to me, because it might imply that all the linked WebVTT resources need to be fetched and parsed before any video can play, even if they're unused.

Otherwise I'd say that in linking the definitions we would only be specifying the permitted values for each of the listed keys, and not saying anything about how they're used elsewhere.

Would it be a layering violation if we said something like "If a WebVTT file referenced by a <track> element has different values for any of [keys] to those specified in the <track> element, then $consequence"? My thinking is that could be something that the WebVTT processing model would reasonably define.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that belongs in HTML.

Currently, HTML allows in-band text tracks to have format-provided kind, label, and language:

Set the new text track's kind, label, and language based on the semantics of the relevant data, as defined by the relevant specification. If there is no label in that data, then the label must be set to the empty string.

https://html.spec.whatwg.org/#sourcing-in-band-text-tracks

And https://dev.w3.org/html5/html-sourcing-inband-tracks/ specifies how this data is provided for some formats.

For out-of-band text tracks (i.e. using the HTML track element), HTML needs changes to allow format-specified metadata.

https://html.spec.whatwg.org/#sourcing-out-of-band-text-tracks

I think getting the integration with HTML right is important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're expecting changes to HTML to inspect the referenced WebVTT files and use their values to populate the kind, label and srclang attributes

I would not expect that to happen, but the format-provided values would populate the internal concepts and be readable from the TextTrack object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this @zcorpan - what is it that you would not expect to happen?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect the content attribute values, which can be read with e.g. getAttribute(), to not change based on the contents of the WebVTT file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've filed whatwg/html#11665

Copy link
Contributor

@nigelmegitt nigelmegitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is ready to merge yet; I've left some questions.

@chrisn
Copy link
Member

chrisn commented Sep 15, 2025

Sorry if I've missed this, but is there a processing model for the attributes block that, e.g., defines how they are handled in a <track> element? Or if not, should there be one?

Copy link
Member

@zcorpan zcorpan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not define parsing of ATTRIBUTES blocks in https://w3c.github.io/webvtt/#file-parsing , so currently they are dropped during parsing.

Why not use the "header" parser logic for attributes? i.e.

WEBVTT
kind: subtitles
lang: es-mx
label: Español

...

Comment on lines +4306 to +4313
<dt>If the key name is "<code>kind</code>" (<a href="https://infra.spec.whatwg.org/#ascii-case-insensitive">ASCII case-insensitive</a>)</dt>
<dd>Process the value as <a href="https://html.spec.whatwg.org/multipage/media.html#attr-track-kind">the kind attribute</a> of a track element according to the HTML Standard.</dd>

<dt>If the key name is "<code>lang</code>" (<a href="https://infra.spec.whatwg.org/#ascii-case-insensitive">ASCII case-insensitive</a>)</dt>
<dd>Process the value as <a href="https://html.spec.whatwg.org/multipage/media.html#attr-track-srclang">the srclang attribute</a> of a track element according to the HTML Standard.</dd>

<dt>If the key name is "<code>label</code>" (<a href="https://infra.spec.whatwg.org/#ascii-case-insensitive">ASCII case-insensitive</a>)</dt>
<dd>Process the value as <a href="https://html.spec.whatwg.org/multipage/media.html#attr-track-label">the label attribute</a> of a track element according to the HTML Standard.</dd>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a layering violation. WebVTT should make the data available and HTML should say what to do with it in its track processing model.

@gkatsev
Copy link
Collaborator

gkatsev commented Sep 15, 2025

This does not define parsing of ATTRIBUTES blocks in https://w3c.github.io/webvtt/#file-parsing , so currently they are dropped during parsing.

Great call out, I completely missed that.

Why not use the "header" parser logic for attributes? i.e.

I think the question is if this will conflict with things like HLS's timestamp map or other usage of headers in the wild. We'd still need to bring back official header parsing logic, though.

Keeping it as a separate block is likely the easiest.

@zcorpan
Copy link
Member

zcorpan commented Sep 16, 2025

OK but per #485 (comment) it seems YouTube already uses the header for these attributes?

HLS timestamp map uses different keys, so it doesn't conflict, as far as I can tell.

<dt>If the key name is "<code>kind</code>" (<a href="https://infra.spec.whatwg.org/#ascii-case-insensitive">ASCII case-insensitive</a>)</dt>
<dd>Process the value as <a href="https://html.spec.whatwg.org/multipage/media.html#attr-track-kind">the kind attribute</a> of a track element according to the HTML Standard.</dd>

<dt>If the key name is "<code>lang</code>" (<a href="https://infra.spec.whatwg.org/#ascii-case-insensitive">ASCII case-insensitive</a>)</dt>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this lang and not language, to match #485 (comment) ?

Copy link
Member

@zcorpan zcorpan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as request changes to address my comments.

@gkatsev
Copy link
Collaborator

gkatsev commented Sep 16, 2025

looks like webm also has file-wide metadata https://wiki.webmproject.org/webm-metadata/temporal-metadata/webvtt-in-webm
it seems it uses Language and Kind there, though, on another page it uses VTTKind: https://wiki.webmproject.org/webm-metadata/temporal-metadata/webvtt-metadata
it also looks like the values they use are capitalized.

Looking at the HLS spec again, I guess the main difference is that this block is currently define as key: value but in HLS with the X-TIMESTAMP-MAP, it's X-TIMESTAMP-MAP=value. Do we want to account for both : and = as the delimiter here?

Also, if we do move it back to the pre-existing headers location, which would fully resolve #485, the HLS spec would likely need to get updated to account for potential other items in the headers.


Also, looking at the webm pages, there's a link to https://lists.w3.org/Archives/Public/public-texttracks/2012Apr/0011.html, which has in interesting suggestion of a getHeaders() API. That sounds like a great idea, but I don't think it needs to get added as part of this PR specifically.

@zcorpan
Copy link
Member

zcorpan commented Sep 16, 2025

I guess we could support either : or = but maybe make = non-conforming.

@cookiecrook
Copy link
Author

cookiecrook commented Sep 19, 2025

I don't think that this PR fully resolves #485.

Okay. I have modified this to remove "Closes #485" and just leaves the linked PR closure for #511. I'll address the other comments as time allows. Thanks all.

@gkatsev
Copy link
Collaborator

gkatsev commented Sep 23, 2025

Let me know if you think it would be helpful to talk through it some more on a TTWG call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VTT Metadata Cue format is ambiguous; some metadata may be unintentionally presented to the user in a context outside HTML