Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add captureTimestamp and senderCaptureTimeOffset to frame metadata #228

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,11 @@ spec:webidl; type:dfn; text:resolve
"CloneArrayBuffer": {
"href": "https://tc39.es/ecma262/#sec-clonearraybuffer",
"title": "CloneArrayBuffer"
},
"RTP-EXT-CAPTURE-TIME": {
"href": "http://www.webrtc.org/experiments/rtp-hdrext/abs-capture-time",
Copy link
Member

@jan-ivar jan-ivar Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This normative reference redirects to a googlesource link. I'll need to check with folks internally how to handle this.

Can we add a note mentioning any efforts to standardize this?

My understanding is after an IETF effort it would end up here?

Copy link
Member

@jan-ivar jan-ivar Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See w3c/webrtc-extensions#216. I worry if we merge this it creates a blocker for advancement.

"title": "RTP Header Extension for Absolute Capture Time",
"publisher": "WebRTC Project"
}
}
</pre>
Expand Down Expand Up @@ -134,6 +139,20 @@ The <dfn abstract-op>readEncodedData</dfn> algorithm is given a |rtcObject| as p
1. Let |frame| be the newly produced frame.
1. Set |frame|.`[[owner]]` to |rtcObject|.
1. Set |frame|.`[[counter]]` to |rtcObject|.`[[lastEnqueuedFrameCounter]]`.
1. If the frame has been produced by a {{RTCRtpReceiver}}:
1. If the relevant RTP packet contains the
[[RTP-EXT-CAPTURE-TIME|RTP Header Extension for Absolute Capture Time]], set |frame|.`[[captureTimestamp]]` to the
[[RTP-EXT-CAPTURE-TIME#absolute-capture-timestamp|absolute capture timestamp]] field and set |frame|.`[[senderCaptureTimeOffset]]`
to the [[RTP-EXT-CAPTURE-TIME#estimated-capture-clock-offset|capture clock offset field]] if it is present.
1. Otherwise, if the relevant RTP packet does not contain the
[[RTP-EXT-CAPTURE-TIME|RTP Header Extension for Absolute Capture Time]] but a previous RTP packet did,
set |frame|.`[[captureTimestamp]]` to the result of calculating the absolute capture timestamp according to
[[RTP-EXT-CAPTURE-TIME#timestamp-interpolation|timestamp interpolation]] and set |frame|.`[[senderCaptureTimeOffset]]`
to the most recent value that was present.
1. Otherwise, set |frame|.`[[captureTimestamp]]` to undefined and set |frame|.`[[senderCaptureTimeOffset]]` to undefined.
1. If the frame has been produced by a {{RTCRtpSender}}, set |frame|.`[[captureTimestamp]]` to the capture timestamp
using the methodology described in [[RTP-EXT-CAPTURE-TIME#absolute-capture-timestamp]] and set frame.`[[senderCaptureTimeOffset]]`
to undefined.
1. [=ReadableStream/Enqueue=] |frame| in |rtcObject|.`[[readable]]`.

The <dfn abstract-op>writeEncodedData</dfn> algorithm is given a |rtcObject| as parameter and a |frame| as input. It is defined by running the following steps:
Expand Down Expand Up @@ -293,6 +312,10 @@ The <dfn method for="SFrameTransform">setEncryptionKey(|key|, |keyID|)</dfn> met

# RTCRtpScriptTransform # {#scriptTransform}

In this section, the capture system refers to the system where media is sourced from and the sender system
refers to the system that is sending RTP and RTCP packets to the receiver system where {{RTCEncodedVideoFrameMetadata}} data
or {{RTCEncodedAudioFrameMetadata}} data is populated.

## <dfn enum>RTCEncodedVideoFrameType</dfn> dictionary ## {#RTCEncodedVideoFrameType}
<pre class="idl">
// New enum for video frame types. Will eventually re-use the equivalent defined
Expand Down Expand Up @@ -358,6 +381,8 @@ dictionary RTCEncodedVideoFrameMetadata {
sequence&lt;unsigned long&gt; contributingSources;
long long timestamp; // microseconds
unsigned long rtpTimestamp;
DOMHighResTimeStamp captureTimestamp;
DOMHighResTimeStamp senderCaptureTimeOffset;
DOMString mimeType;
};
</pre>
Expand Down Expand Up @@ -431,6 +456,32 @@ dictionary RTCEncodedVideoFrameMetadata {
that reflects the sampling instant of the first octet in the RTP data packet.
</p>
</dd>
<dt>
<dfn dict-member>captureTimestamp</dfn> <span class="idlMemberType">DOMHighResTimeStamp</span>
</dt>
<dd>
<p>
The {{RTCEncodedVideoFrameMetadata/captureTimestamp}} is set by the frame source, and for frames that come
from the {{RTCRtpReceiver}}, it is extracted by the [[#stream-processing]] algorithm. Its reference clock
is the capture system's NTP clock (same clock used to generate NTP timestamps for RTCP sender reports on
that system).

On populating this member, the user agent MUST return the value of the frame's `[[captureTimestamp]]` slot.
</p>
</dd>
<dt>
<dfn dict-member>senderCaptureTimeOffset</dfn> <span class="idlMemberType">DOMHighResTimeStamp</span>
</dt>
<dd>
<p>
The {{RTCEncodedVideoFrameMetadata/senderCaptureTimeOffset}} is the sender system's estimate of the offset
between its own NTP clock and the capture system's NTP clock, for the same frame that the
{{RTCEncodedVideoFrameMetadata/captureTimestamp}} was originated from. It is extracted by the
[[#stream-processing]] algorithm.

On populating this member, the user agent MUST return the value of the frame's `[[senderCaptureTimeOffset]]` slot.
</p>
</dd>
<dt>
<dfn dict-member>mimeType</dfn> <span class="idlMemberType">DOMString</span>
</dt>
Expand Down Expand Up @@ -611,6 +662,8 @@ dictionary RTCEncodedAudioFrameMetadata {
sequence&lt;unsigned long&gt; contributingSources;
short sequenceNumber;
unsigned long rtpTimestamp;
DOMHighResTimeStamp captureTimestamp;
DOMHighResTimeStamp senderCaptureTimeOffset;
DOMString mimeType;
};
</pre>
Expand Down Expand Up @@ -664,6 +717,32 @@ dictionary RTCEncodedAudioFrameMetadata {
that reflects the sampling instant of the first octet in the RTP data packet.
</p>
</dd>
<dt>
<dfn dict-member>captureTimestamp</dfn> <span class="idlMemberType">DOMHighResTimeStamp</span>
</dt>
<dd>
<p>
The {{RTCEncodedAudioFrameMetadata/captureTimestamp}} is set by the frame source, and for frames that come
from the {{RTCRtpReceiver}}, it is extracted by the [[#stream-processing]] algorithm. Its reference clock
is the capture system's NTP clock (same clock used to generate NTP timestamps for RTCP sender reports on
that system).

On populating this member, the user agent MUST return the value of the frame's `[[captureTimestamp]]` slot.
</p>
</dd>
<dt>
<dfn dict-member>senderCaptureTimeOffset</dfn> <span class="idlMemberType">DOMHighResTimeStamp</span>
</dt>
<dd>
<p>
The {{RTCEncodedAudioFrameMetadata/senderCaptureTimeOffset}} is the sender system's estimate of the offset
between its own NTP clock and the capture system's NTP clock, for the same frame that the
{{RTCEncodedAudioFrameMetadata/captureTimestamp}} was originated from. It is extracted by the
[[#stream-processing]] algorithm.

On populating this member, the user agent MUST return the value of the frame's `[[senderCaptureTimeOffset]]` slot.
</p>
</dd>
<dt>
<dfn dict-member>mimeType</dfn> <span class="idlMemberType">DOMString</span>
</dt>
Expand Down