WebTransport/WebCodecs interaction (copies) #231

aboba · 2021-03-25T05:50:09Z

There are a number of use cases where it is envisaged that WebTransport + WebCodecs will be used together:

Video upload, where a captured MediaStreamTrack is converted to raw video frames, encoded by WebCodecs, and then sent to a server using WebTransport.
Video streaming, where a video stream is sent to a WebTransport client, encodedVideoChunks are decoded using WebCodecs into videoFrames, converted to a MediaStreamTrack and rendered by a video tag.

In use case #1, it is desirable for the encodedVideoChunk produced by WebCodecs to be passed to WebTransport and sent with as few copies as possible (e.g. just one copy for process separation). Is this possible?

In use case #2, it is desirable for WebTransport datagram or reliable stream reader to transfer datagrams/segments of the encodedVideoChunks into a GPU buffer rather than main memory, so as to allow most efficient ingestion by WebCodecs and avoid a copy. Is this possible?

chcunningham · 2021-03-31T04:50:16Z

I think you nailed the use cases, but the assumptions about GPU buffers aren't quite right (I probably didn't communicate this well in our meeting).

Here's some background:

In Chrome, GPU memory would only ever be used to back a VideoFrame. This is generally done because VideoFrames ultimately get composited/rendered into GPU memory, so having them there to start with reduces CPU->GPU copies.
We do use shared memory (but not GPU memory) when moving encoded chunks between processes, but this just an implementation detail.
For encoded chunks, we're generally not super concerned with copies because the data is relatively small. Chrome's implementation of existing

In spite of point 3, I think we should still map out the flow from WebTransport to what copies of encoded data are forced by the current API shape.

For use case #1 (sending encoded data -> web):

EncodedVideoChunk currently has a bug where it directly exposes its array buffer. We should replace this with a copyInto() method (see Should the spec require copying init.data when constructing EncodedAudioChunk and EncodedVideoChunk? webcodecs#127 for justficiation). This means at least one copy from WC chunk going to WebTransport.
I am not a streams expert, but I assume that we can write the entire ArrayBuffer to the WT stream in one pass, so no additional copies there. @aboba, sound right?

For use case #2 (downloading encoded data from webtransport)

I don't know offhand how much control WebTransport users have over the size of stream chunks (not to be confused with WC chunks) that they read from the ReadableStream. If control is limited, it may occur that the stream chunk is smaller than the full WC chunk such that additional buffering (copies) is required before being able to feed WC.
Once you have a full WC chunks worth of data, you can pass that as ArrayBuffer to create the WC chunk. Presently this copies the ArraysBuffer to prevent TOCTU issues. But we Detach codec inputs (where feasible) webcodecs#104 proposes letting callers transfer this ArrayBuffer into the chunk.

Again, I don't think any of these copies is a big deal. Just trying to map it all out and look for optimizations.

aboba · 2021-03-31T22:57:56Z

@chcunningham There may be some differences depending which WebTransport mode is used. WebTransport supports ordered/reliable transport (where data is sent over a single reliable stream), unordered/reliable transport (a distinct reliable stream for each message), or unreliable/unordered (HTTP/3 datagrams). Reliable ordered transport on a single stream might be used with HLS or DASH, with WebCodecs substituted for MSE. For low-latency game streaming, you might want a different transport to minimize latency, such as unreliable/unordered (transporting packetized media over HTTP/3 datagrams, with the application handling robustness via its own RTX/FEC/RED), or partially reliable/unordered (e.g. using a distinct stream for each EncodedVideoChunk, with a retransmission time limit), or maybe some mixture (e.g. reliable stream for key frames, unordered/unreliable for P-frames).

Let me try to walk through the use cases, covering reliable/ordered or unreliable/unordered transport. I believe that the reliable/unordered case is similar to reliable/ordered.

For use case #1, if I read Issue w3c/webcodecs#155 correctly, if the goal is to upload containerized media, you would write the EncodedVideoChunk to the reliable stream, which would handle segmentation, re-transmission and ordering transparently. For datagrams you most likely would not be sending containerized media, so if the EncodedVideoChunk is provided in containerized form, it would be necessary to de-containerize it and then packetize. The application would also be responsible for robustness (e.g. RTX/FEC/RED) and ordering. It would be nice to be able to select portions of the EncodedVideoChunk to be written into the datagram payloads without having to copy, so the de-containerization/packetization process can be most efficient.

For use case #2, if you are reading the EncodedVideoChunk from a reliable stream, and it is received in containerized form, you'd want the segments to be deposited into a contiguous ArrayBuffer which you can then feed to the WebCodecs decoder. For datagrams, if the media was not containerized for transport, but needs to be provided to WebCodecs in containerized form, you'd need to de-packetize and containerize the media before feeding it to the WebCodecs decoder.

wilaw · 2021-04-13T21:32:50Z

@aboba - can you reference the discussions you spoke about in this mornings call? Concerning BYOB readers.

yutakahirano · 2021-04-14T08:51:11Z

Are both of datagrams and reliable streams target?

aboba · 2021-04-14T15:15:25Z

@yutakahirano Yes. Some use cases:

Video upload: WebCodecs encoding and sending via WebTransport reliable streams.
Low-latency streaming: WebTransport receiving unordered/unreliable datagrams, decoding via WebCodecs.
Conventional streaming: WebTransport receiving reliable stream, decoding via WebCodecs.

aboba · 2021-04-17T01:28:10Z

@wilaw Here are some references:
whatwg/streams#495
#131

yutakahirano · 2021-04-19T06:35:14Z

Here is my mental model for the reliable streams case, client => server.

A script provides the data to upload as a Uint8Array.
The browser copies the data into an IPC buffer [A].
- There may be one extra copy here, when the IPC buffer is full, but that can be avoidable with checking writer.ready.
In the network process, create QUIC frames for the data [B].
Send the data to the network [C].

[A], [B], [C] are potential copies.

[A] comes from the followings in the spec side:

We have separate renderers and the network process. This is for security and (implicitly?) required by the spec.
There is no way to write-and-transfer data at WritableStreamDefaultWriter.write.
There is no way to allocate an ArrayBuffer to be accessible from multiple processes.

On the other hand, we may be able to optimize away the copy with (all of) the followings:

We have a conversion function from a VideoFrame to bytes, and it's provided by the browser.
The VideoFrame is not exposed to scripts (or, having a lock system to restrict the access to the video memory, similar to ReadableStream.getReader is good too).

In other words, if the contents of the video memory is not accessible to scripts then it is possible to provide the GPU memory handle directly to the network process and eliminate the copy at [A].

Reg: [B] I expect at least one copy here because we have a TLS encryption process here (but it depends on the definition of "copy"). I can think of a system where we allocate a buffer backed by the network hardware, and there is only one copy involved (the input is the IPC buffer, and the output is the buffer backed by the network hardware, and all the HTTP/3 and QUIC protocol processing is done between them) for [B] and [C] combined. @DavidSchinazi knows much more than me here.

@aboba, does this make sense / is this useful?

yutakahirano · 2021-04-19T06:53:37Z

One correction:

In other words, if the contents of the video memory is not accessible to scripts then it is possible to provide the GPU memory handle directly to the network process and eliminate the copy at [A].

I came to think this is problematic in terms of security. Running the conversion logic should be done in the renderer, not in the network process. Still, we should be able to eliminate the copy and intermediary buffer allocation.

jan-ivar · 2021-06-08T14:31:00Z

Meeting:

A new copyTo method planned in WebCodecs
@aboba ship-blocking for webCodecs but maybe not us. To ask for review from @chcunningham
Will there be a copyFrom?
BYOB may still be worth looking at before ship Zero copy support via BYOB (add an example) #131

aboba · 2022-10-30T13:46:34Z

BYOB support has been added as of M108. Since encoded chunks are much smaller than VideoFrames, copies of encoded chunks between WebCodecs and WebTransport are not as big an issue as concurrency in frame/stream transport, which appears to be achievable by removing await before writer.write() and writer.close() (await writer.ready is fine).

wilaw · 2023-01-25T18:47:17Z

@aboba - can we close this issue?

wilaw added the Discuss at next meeting Flags an issue to be discussed at the next WG working label Mar 31, 2021

aboba closed this as completed Apr 14, 2021

aboba reopened this Apr 14, 2021

padenot mentioned this issue May 3, 2021

Improve memory locality for higher performances and reduced memory working set w3c/webcodecs#212

Open

wilaw added this to the Minimum viable ship milestone May 26, 2021

jan-ivar removed this from the Minimum viable ship milestone Jun 8, 2021

wilaw removed the Discuss at next meeting Flags an issue to be discussed at the next WG working label Jun 9, 2021

yutakahirano added this to the No mileston milestone Jun 30, 2021

aboba closed this as completed Jan 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebTransport/WebCodecs interaction (copies) #231

WebTransport/WebCodecs interaction (copies) #231

aboba commented Mar 25, 2021 •

edited

Loading

chcunningham commented Mar 31, 2021 •

edited

Loading

aboba commented Mar 31, 2021 •

edited

Loading

wilaw commented Apr 13, 2021

yutakahirano commented Apr 14, 2021

aboba commented Apr 14, 2021

aboba commented Apr 17, 2021

yutakahirano commented Apr 19, 2021 •

edited

Loading

yutakahirano commented Apr 19, 2021

jan-ivar commented Jun 8, 2021 •

edited

Loading

aboba commented Oct 30, 2022

wilaw commented Jan 25, 2023

WebTransport/WebCodecs interaction (copies) #231

WebTransport/WebCodecs interaction (copies) #231

Comments

aboba commented Mar 25, 2021 • edited Loading

chcunningham commented Mar 31, 2021 • edited Loading

aboba commented Mar 31, 2021 • edited Loading

wilaw commented Apr 13, 2021

yutakahirano commented Apr 14, 2021

aboba commented Apr 14, 2021

aboba commented Apr 17, 2021

yutakahirano commented Apr 19, 2021 • edited Loading

yutakahirano commented Apr 19, 2021

jan-ivar commented Jun 8, 2021 • edited Loading

aboba commented Oct 30, 2022

wilaw commented Jan 25, 2023

aboba commented Mar 25, 2021 •

edited

Loading

chcunningham commented Mar 31, 2021 •

edited

Loading

aboba commented Mar 31, 2021 •

edited

Loading

yutakahirano commented Apr 19, 2021 •

edited

Loading

jan-ivar commented Jun 8, 2021 •

edited

Loading