Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebTransport/WebCodecs interaction (copies) #231

Closed
aboba opened this issue Mar 25, 2021 · 11 comments
Closed

WebTransport/WebCodecs interaction (copies) #231

aboba opened this issue Mar 25, 2021 · 11 comments

Comments

@aboba
Copy link
Collaborator

aboba commented Mar 25, 2021

There are a number of use cases where it is envisaged that WebTransport + WebCodecs will be used together:

  1. Video upload, where a captured MediaStreamTrack is converted to raw video frames, encoded by WebCodecs, and then sent to a server using WebTransport.
  2. Video streaming, where a video stream is sent to a WebTransport client, encodedVideoChunks are decoded using WebCodecs into videoFrames, converted to a MediaStreamTrack and rendered by a video tag.

In use case #1, it is desirable for the encodedVideoChunk produced by WebCodecs to be passed to WebTransport and sent with as few copies as possible (e.g. just one copy for process separation). Is this possible?

In use case #2, it is desirable for WebTransport datagram or reliable stream reader to transfer datagrams/segments of the encodedVideoChunks into a GPU buffer rather than main memory, so as to allow most efficient ingestion by WebCodecs and avoid a copy. Is this possible?

@chcunningham
Copy link

chcunningham commented Mar 31, 2021

I think you nailed the use cases, but the assumptions about GPU buffers aren't quite right (I probably didn't communicate this well in our meeting).

Here's some background:

  1. In Chrome, GPU memory would only ever be used to back a VideoFrame. This is generally done because VideoFrames ultimately get composited/rendered into GPU memory, so having them there to start with reduces CPU->GPU copies.
  2. We do use shared memory (but not GPU memory) when moving encoded chunks between processes, but this just an implementation detail.
  3. For encoded chunks, we're generally not super concerned with copies because the data is relatively small. Chrome's implementation of existing

In spite of point 3, I think we should still map out the flow from WebTransport to what copies of encoded data are forced by the current API shape.

For use case #1 (sending encoded data -> web):

For use case #2 (downloading encoded data from webtransport)

  • I don't know offhand how much control WebTransport users have over the size of stream chunks (not to be confused with WC chunks) that they read from the ReadableStream. If control is limited, it may occur that the stream chunk is smaller than the full WC chunk such that additional buffering (copies) is required before being able to feed WC.
  • Once you have a full WC chunks worth of data, you can pass that as ArrayBuffer to create the WC chunk. Presently this copies the ArraysBuffer to prevent TOCTU issues. But we Detach codec inputs (where feasible)  webcodecs#104 proposes letting callers transfer this ArrayBuffer into the chunk.

Again, I don't think any of these copies is a big deal. Just trying to map it all out and look for optimizations.

@wilaw wilaw added the Discuss at next meeting Flags an issue to be discussed at the next WG working label Mar 31, 2021
@aboba
Copy link
Collaborator Author

aboba commented Mar 31, 2021

@chcunningham There may be some differences depending which WebTransport mode is used. WebTransport supports ordered/reliable transport (where data is sent over a single reliable stream), unordered/reliable transport (a distinct reliable stream for each message), or unreliable/unordered (HTTP/3 datagrams). Reliable ordered transport on a single stream might be used with HLS or DASH, with WebCodecs substituted for MSE. For low-latency game streaming, you might want a different transport to minimize latency, such as unreliable/unordered (transporting packetized media over HTTP/3 datagrams, with the application handling robustness via its own RTX/FEC/RED), or partially reliable/unordered (e.g. using a distinct stream for each EncodedVideoChunk, with a retransmission time limit), or maybe some mixture (e.g. reliable stream for key frames, unordered/unreliable for P-frames).

Let me try to walk through the use cases, covering reliable/ordered or unreliable/unordered transport. I believe that the reliable/unordered case is similar to reliable/ordered.

For use case #1, if I read Issue w3c/webcodecs#155 correctly, if the goal is to upload containerized media, you would write the EncodedVideoChunk to the reliable stream, which would handle segmentation, re-transmission and ordering transparently. For datagrams you most likely would not be sending containerized media, so if the EncodedVideoChunk is provided in containerized form, it would be necessary to de-containerize it and then packetize. The application would also be responsible for robustness (e.g. RTX/FEC/RED) and ordering. It would be nice to be able to select portions of the EncodedVideoChunk to be written into the datagram payloads without having to copy, so the de-containerization/packetization process can be most efficient.

For use case #2, if you are reading the EncodedVideoChunk from a reliable stream, and it is received in containerized form, you'd want the segments to be deposited into a contiguous ArrayBuffer which you can then feed to the WebCodecs decoder. For datagrams, if the media was not containerized for transport, but needs to be provided to WebCodecs in containerized form, you'd need to de-packetize and containerize the media before feeding it to the WebCodecs decoder.

@wilaw
Copy link
Contributor

wilaw commented Apr 13, 2021

@aboba - can you reference the discussions you spoke about in this mornings call? Concerning BYOB readers.

@yutakahirano
Copy link
Contributor

Are both of datagrams and reliable streams target?

@aboba
Copy link
Collaborator Author

aboba commented Apr 14, 2021

@yutakahirano Yes. Some use cases:

  1. Video upload: WebCodecs encoding and sending via WebTransport reliable streams.
  2. Low-latency streaming: WebTransport receiving unordered/unreliable datagrams, decoding via WebCodecs.
  3. Conventional streaming: WebTransport receiving reliable stream, decoding via WebCodecs.

@aboba aboba closed this as completed Apr 14, 2021
@aboba aboba reopened this Apr 14, 2021
@aboba
Copy link
Collaborator Author

aboba commented Apr 17, 2021

@wilaw Here are some references:
whatwg/streams#495
#131

@yutakahirano
Copy link
Contributor

yutakahirano commented Apr 19, 2021

Here is my mental model for the reliable streams case, client => server.

  1. A script provides the data to upload as a Uint8Array.
  2. The browser copies the data into an IPC buffer [A].
    • There may be one extra copy here, when the IPC buffer is full, but that can be avoidable with checking writer.ready.
  3. In the network process, create QUIC frames for the data [B].
  4. Send the data to the network [C].

[A], [B], [C] are potential copies.

[A] comes from the followings in the spec side:

  1. We have separate renderers and the network process. This is for security and (implicitly?) required by the spec.
  2. There is no way to write-and-transfer data at WritableStreamDefaultWriter.write.
  3. There is no way to allocate an ArrayBuffer to be accessible from multiple processes.

On the other hand, we may be able to optimize away the copy with (all of) the followings:

  1. We have a conversion function from a VideoFrame to bytes, and it's provided by the browser.
  2. The VideoFrame is not exposed to scripts (or, having a lock system to restrict the access to the video memory, similar to ReadableStream.getReader is good too).

In other words, if the contents of the video memory is not accessible to scripts then it is possible to provide the GPU memory handle directly to the network process and eliminate the copy at [A].

Reg: [B] I expect at least one copy here because we have a TLS encryption process here (but it depends on the definition of "copy"). I can think of a system where we allocate a buffer backed by the network hardware, and there is only one copy involved (the input is the IPC buffer, and the output is the buffer backed by the network hardware, and all the HTTP/3 and QUIC protocol processing is done between them) for [B] and [C] combined. @DavidSchinazi knows much more than me here.

@aboba, does this make sense / is this useful?

@yutakahirano
Copy link
Contributor

One correction:

In other words, if the contents of the video memory is not accessible to scripts then it is possible to provide the GPU memory handle directly to the network process and eliminate the copy at [A].

I came to think this is problematic in terms of security. Running the conversion logic should be done in the renderer, not in the network process. Still, we should be able to eliminate the copy and intermediary buffer allocation.

@jan-ivar
Copy link
Member

jan-ivar commented Jun 8, 2021

Meeting:

@wilaw wilaw removed the Discuss at next meeting Flags an issue to be discussed at the next WG working label Jun 9, 2021
@yutakahirano yutakahirano added this to the No mileston milestone Jun 30, 2021
@aboba
Copy link
Collaborator Author

aboba commented Oct 30, 2022

BYOB support has been added as of M108. Since encoded chunks are much smaller than VideoFrames, copies of encoded chunks between WebCodecs and WebTransport are not as big an issue as concurrency in frame/stream transport, which appears to be achievable by removing await before writer.write() and writer.close() (await writer.ready is fine).

@wilaw
Copy link
Contributor

wilaw commented Jan 25, 2023

@aboba - can we close this issue?

@aboba aboba closed this as completed Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants