Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose RTCEncodedAudioFrame interface in AudioWorklets #226

Open
tonyherre opened this issue Mar 8, 2024 · 13 comments
Open

Expose RTCEncodedAudioFrame interface in AudioWorklets #226

tonyherre opened this issue Mar 8, 2024 · 13 comments

Comments

@tonyherre
Copy link
Contributor

Working with encoded frames from worklets, particularly RTCEncodedAudioFrames from AudioWorklets, would be very useful for apps, allowing them to choose the best execution environment for encoded media processing, beyond just Window and DedicatedWorker.

Readable and WritableStreams already have Exposed=Worklet, so transferring the streams of encoded frames would make sense and allow more performant implementations than eg requiring apps to copy data & metadata in a DedicatedWorker before going to the worklet / after returning from it.

I propose we add Worklet to the Exposed lists for RTCEncodedVideoFrame and RTCEncodedAudioFrame, and likely follow up with similar changes for the interfaces in the webcodecs spec.

CC @alvestrand @aboba @guidou @youennf

@youennf
Copy link
Collaborator

youennf commented Mar 8, 2024

In typical WebRTC applications, there is a thread for audio capture/rendering and there are threads dedicated to networking, media handling.
The former (which maps to AudioWorklet) is usually higher priority than the latter (which map to DedicatedWorkers).

I am not sure allowing to do encoding/networking in AudioWorklet is a good idea.
For instance webcodecs construct are not supported in worklets.
WebRTC encoded transform streams are transferable but RTCRtpScriptTransformer is not.

CC @padenot and @jan-ivar.

@padenot
Copy link

padenot commented Mar 8, 2024

For video, just no.

Decoding audio on real-time threads has been seen in very specific scenarios, and can be done safely, if the decoder is real-time safe, etc. all the usual stuff. Encoding, I've never seen it and I don't really know how useful it would be or what it would bring.

The Web Codecs API however won't work well (or at all) in AudioWorklet, because the AudioWorklet is inherently and by necessity a synchronous environment, and the Web Codecs API an asynchronous API. I had proposed a synchronous API for Web Codecs, and explained why (in w3c/webcodecs#19), but we haven't done it.

I side with @youennf on this. Communicating with an AudioWorkletProcessor is not hard and can easily be done extremely efficiently. Any claim of being able to do a "more performant" implementation need to be backed by something.

Once we have apps for which the limiting factor is the packetization latency or something in that area, we can revisit.

@tonyherre
Copy link
Contributor Author

tonyherre commented Mar 8, 2024

WebRTC encoded transform streams are transferable but RTCRtpScriptTransformer is not.

The RTCRtpScriptTransformer.readable is transferable, so could be posted to a worklet within the current shape.

The former usecase @youennf mentioned - decoding+rendering Audio - is indeed the one I'm interested in getting on a worklet. IIUC libwebrtc does its audio decoding in the realtime thread, just before rendering, so that concept isn't all that wild.

Any claim of being able to do a "more performant" implementation need to be backed by something.

Transferring the readablestream to the worklet would mean frames could be delivered directly there. Requiring JS work to be done elsewhere first would necessitate visiting another JS thread, scheduling an event there etc, so ~double the overhead plus the cost of allocating the intermediate objects to be re-transferred. I can see if I can get some more concrete numbers, but there's clearly additional work needed to be done by app+UA which could be skipped with this.

@jan-ivar
Copy link
Member

jan-ivar commented Mar 8, 2024

The RTCRtpScriptTransformer.readable is transferable, so could be posted to a worklet within the current shape.

This produces a "readable side in another realm", where the original realm feeding that readable is the dedicated worker provided by the webpage, at least in current implementations of that surface.

requiring apps to copy data & metadata in a DedicatedWorker before going to the worklet / after returning from it.

Can't the app just transfer the frame.data ArrayBuffer instead? i.e. not a copy. It'd be interesting to see the numbers.

Transferring the readablestream to the worklet would mean frames could be delivered directly there.

This sounds like whatwg/streams#1124.

@jan-ivar
Copy link
Member

jan-ivar commented Mar 8, 2024

IIUC libwebrtc does its audio decoding in the realtime thread, just before rendering, so that concept isn't all that wild.

I wasn't aware. Is this a special-case for element.srcObject = new MediaStream([transceiver.receiver.track]);?

@padenot
Copy link

padenot commented Mar 11, 2024

I wasn't aware. Is this a special-case for element.srcObject = new MediaStream([transceiver.receiver.track]);?

It's an implementation concern, script doesn't know about it.

@Orphis
Copy link

Orphis commented May 2, 2024

Would you be open to just reframe the issue to exposing RTCEncodedAudioFrame to an AudioWorklet context?

@padenot
Copy link

padenot commented May 2, 2024

If we want to do decoding in real-time threads, in the AudioWorkletGlobalScope, here are some rough steps:

  • Adding a way to get the encoded audio packet from a RTCEncodedAudioFrame, without allocations, and in a real-time-safe manner
  • Adding a sync decoding interface to Web Codecs, that doesn't use js objects (just buffer in, buffer out, no allocs, etc.), so it's real-time safe
  • Exposing this to the AudioWorkletGlobalScope
  • Adding some tranfering capabilities to Web Codec objects so that one could set-up a decoder outside the real-time thread, and send it over

@Orphis
Copy link

Orphis commented May 2, 2024

I don't think this should necessarily be tied to WebCodecs. While they can be useful in some cases, they are not going to be covering all use cases or experimental codec work that is inherently implemented in JS / WASM.

@padenot
Copy link

padenot commented May 2, 2024

If you're not using Web Codecs, there's not benefit to exposing RTCEncodedAudioFrame. Just extract the data into a buffer and communicate to the AudioWorkletGlobalScope. This can be done today using postMessage or SharedArrayBuffer.

@youennf
Copy link
Collaborator

youennf commented May 2, 2024

Instead of transferring stream to worklets, the alternative would be to let script transform take a worklet instead of a worker.
That is probably the most straightforward approach, but still needs discussing in terms of scenarios and pros/cons.

@Orphis
Copy link

Orphis commented May 15, 2024

@aboba Can we add this to the agenda for next week's interim?

@dontcallmedom-bot
Copy link

@tonyherre tonyherre changed the title Expose RTCEncoded*Frame interfaces in Worklets Expose RTCEncodedAudioFrame interface in AudioWorklets Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants