[AUDIO_WORKLET] Optimise the copy back from wasm's heap to JS #22753

cwoffenden · 2024-10-16T14:39:28Z

This builds on #22741 just because that's where I was at, but it's not required. The interesting changes are in audio_worklet.js and I'd appreciate some feedback from @juj before tidying this up (with sanity checks and a fallback).

Since we pass in the stack for the worklet from the caller's heap, its address shouldn't change. And since the (I'll make myself say it) render quantum size doesn't change after the audio worklet creation, the stack positions for the audio buffers should not change either. So, we can create one-time subarray views and replace the float-by-float copy with a simple set() per channel (per output).

I've thrown simple tests at it at and it works, fulfilling the garbage-free requirement and theoretically having a nice performance boost (not measured, but looping over thousands of JS Number types and shuffling them to and from floats must come at a cost). If the outputList does change, then it should only change after changes to the audio chain, which would be expensive enough that changing the subarrays wouldn't make a difference.

To be extra sure, we can move the output buffers to the first entries on the stack, then simple additional changes like input buffers won't change the address.

It wants sanity checks here and there but I'd like feedback for anything I'm missing or misunderstanding. Thanks!

juj · 2024-10-17T18:05:18Z

Nice idea! It looks like this PR carries the rename from the other PR. Rebase/merge should hide it?

src/audio_worklet.js

cwoffenden · 2024-10-17T19:19:33Z

Nice idea!

A shower thought, as all good ideas are!

It looks like this PR carries the rename from the other PR. Rebase/merge should hide it?

Yes, this one was rebased off the earlier one so should have the same commit IDs.

src/audio_worklet.js

cwoffenden · 2024-10-21T11:57:16Z

EDIT: every test I tried suggests no, the stack is only used once we hit Wasm in the callback.

@juj or someone who knows more about the internals of this than me: on entering AudioWorkletProcessor.process() nothing else will have used the stack beforehand? I've verified stack top is always the same when we enter the AudioWorkletProcessor's contructor and process() and it's always the same as the the address (plus length) passed to emscripten_start_wasm_audio_worklet_thread_async().

To rephrase: unless the audio worklet explicitly uses the stack functions from JS, nothing external from Wasm will before callbackFunction() is called?

cwoffenden · 2024-10-28T17:24:04Z

Some notes: lots of experiments with the stack allocations, minimum sizes, various flags (STACK_OVERFLOW_CHECK and ASSERTIONS) and we have some very happy code! View allocations are all in the constructor, with sanity checks performed on the address.

Next is to benchmark it.

cwoffenden · 2024-10-29T14:35:58Z

Benchmarks of the main part of the audio copy done:

https://wip.numfum.com/cw/2024-10-29/index.html

Testing on my M2 Mac Studio in Chrome and Safari this PR is around 15x faster on the float copy, e.g. the original being 0.625µs per process() and the new code 0.044µs. On my aging iPhone 11 it's 1.182µs before and 0.199µs after.

~~Firefox on Mac has the largest difference: 5.090µs before and 0.123µs after, a 41x speed-up.~~ Updating Firefox fixes this and it's now 0.916µs before and 0.118µs after.

@juj if we're in agreement that the simplified standalone JavaScript test code is doing the right thing (it's short and a copy and paste), I can gather numbers from regular hardware (we have a wall of Chromebooks at work).

Next I'll need to create tests to show that this still works with various input and output configs.

EDIT: a 7-12x speed-up seems typical on x64 Windows or Linux.

src/audio_worklet.js

cwoffenden · 2024-10-30T19:46:00Z

~~Note to me: look at #22808~~ ✅

test/webaudio/audio_files/README.md

We can remove the float-by-float JS copy and replace with this simple TypedArray set() calls.

Typed views are recreated if needed but otherwise are reused.

Lots of juggling with the various pointers, and next will be to reduce the code and move all of the output first to stop repeating some of the calculations. Some can also move to the constructor.

The code has also been brought back closer to the original for comparison.

The initial stackAlloc() is overflowing, seeming to need more space so this is accounted for.

Tested with various stack sizes, output sizes, and generators.

The assertions should now cover all cases of changes in address and size of the output views.

(Off home!)

Rough implementation to see what needs doing in JS vs Wasm.

The tests pass the audio context in a void* for convenience, which needs shortening/widening for 64-bit pointers.

sbc100 · 2024-11-19T20:00:02Z

test/test_browser.py

@@ -5382,7 +5382,7 @@ def test_full_js_library_strict(self):
    'minimal_runtime_pthreads_and_closure': (['-sMINIMAL_RUNTIME', '-pthread', '--closure', '1', '-Oz'],),
    'pthreads_es6': (['-pthread', '-sPTHREAD_POOL_SIZE=2', '-sEXPORT_ES6'],),
    'es6': (['-sEXPORT_ES6'],),
-    'strict': (['-sSTRICT'],),
+    'strict': (['-sSTRICT', '-sINCOMING_MODULE_JS_API=canvas,instantiateWasm,monitorRunDependencies,onAbort,onExit,print,setStatus,wasm,wasmMemory'],),


Why is this change needed for this PR?

I was running the audio tests with -sSTRICT and couldn't understand how any of them ran without it. They build, they don't run, e.g., compile any example with STRICT and run it:

Uncaught RuntimeError: Aborted(`Module.wasmMemory` was supplied but `wasmMemory` not included in INCOMING_MODULE_JS_API) at abort (index.js:592:11) at ignoredModuleProp (index.js:823:5) at checkIncomingModuleAPI (index.js:1548:3) at index.js:223:1

It's for sure separate, so can remove it and come back to the tests later.

If wasmMemory is requires by for AUDIO_WORKLET then its should be added explicitly in link.py, so that its always part of INCOMING_MODULE_JS_API (even in strict mode).

sbc100 · 2024-11-19T20:00:50Z

test/test_browser.py

+  # We can't test playback, so don't need the data files, we're testing it builds and runs
+  @parameterized({
+    '': ([],),
+    'minimal_with_closure': (['-sMINIMAL_RUNTIME', '--closure', '1', '-Oz'],),


We normally write this --closure=1 as a single arg

I just copied an existing test for this, I'll change it tomorrow.

sbc100 · 2024-11-19T20:02:08Z

test/webaudio/audioworklet_2x_in_hard_pan.c

+
+// Toggles the play/pause of a MediaElementAudioSourceNode given its ID
+EM_JS(void, toggleTrack, (EMSCRIPTEN_WEBAUDIO_T srcID), {
+	var source = emscriptenGetAudioObject(srcID);


Can you use 2-space indentation for all these new tests?

Sure, I'll change them tomorrow. Then take a shower afterwards.

sbc100 · 2024-11-19T20:03:51Z

test/test_interactive.py

+    os.mkdir('audio_files')
+    shutil.copy(test_file('webaudio/audio_files/emscripten-beat-mono.mp3'), 'audio_files/')
+    shutil.copy(test_file('webaudio/audio_files/emscripten-bass-mono.mp3'), 'audio_files/')
+    self.btest('webaudio/audioworklet_2x_in_hard_pan.c', expected='0', args=['-sAUDIO_WORKLET', '-sWASM_WORKERS'])


Can you use btest_exit and remove the expected arg?

sbc100 · 2024-11-19T20:05:27Z

src/audio_worklet.js

+        // And that the views' size match the passed in output buffers
+        for (i of outputList) {
+          for (j of i) {
+            console.assert(j.byteLength == bytesPerChannel, `AudioWorklet unexpected output buffer size (expected ${bytesPerChannel} got ${j.byteLength})`);


Is it possible to use the emscripten assert function here?

assert() isn't available to the worklet. I could look into why, I think it might just need exposing on the module then a adding to the AudioWorkletGlobalScope. But I think I already looked at this last month.

cwoffenden changed the title ~~[AUDIO_WORKET] Optimise the copy back from wasm's heap to JS~~ [AUDIO_WORKLET] Optimise the copy back from wasm's heap to JS Oct 16, 2024

cwoffenden mentioned this pull request Oct 16, 2024

[AUDIO_WORKLET] Reword API to make it clearer #22741

Merged

cwoffenden force-pushed the cw-audio-tweaks-3 branch 2 times, most recently from 38203f0 to 24a90e4 Compare October 17, 2024 13:01

sbc100 reviewed Oct 17, 2024

View reviewed changes

src/audio_worklet.js Outdated Show resolved Hide resolved

src/audio_worklet.js Outdated Show resolved Hide resolved

cwoffenden force-pushed the cw-audio-tweaks-3 branch from d186eb5 to d4680ca Compare October 18, 2024 06:32

cwoffenden marked this pull request as draft October 18, 2024 06:35

juj reviewed Oct 18, 2024

View reviewed changes

src/audio_worklet.js Outdated Show resolved Hide resolved

cwoffenden force-pushed the cw-audio-tweaks-3 branch 3 times, most recently from dcdcea1 to 5b65dcf Compare October 26, 2024 10:06

cwoffenden force-pushed the cw-audio-tweaks-3 branch from c98adcf to 2373dd2 Compare October 28, 2024 21:53

sbc100 reviewed Oct 29, 2024

View reviewed changes

src/audio_worklet.js Show resolved Hide resolved

cwoffenden force-pushed the cw-audio-tweaks-3 branch 2 times, most recently from 069f7a4 to ae0e8bf Compare October 31, 2024 22:49

cwoffenden commented Nov 1, 2024

View reviewed changes

test/webaudio/audio_files/README.md Outdated Show resolved Hide resolved

cwoffenden force-pushed the cw-audio-tweaks-3 branch 6 times, most recently from f6153e9 to cccece4 Compare November 8, 2024 06:06

cwoffenden force-pushed the cw-audio-tweaks-3 branch 2 times, most recently from b3dc2ef to ac37140 Compare November 14, 2024 19:14

cwoffenden added 26 commits November 16, 2024 02:29

Logging and notes for me

1ce86a9

Better error message (to see why it fails)

8645f17

Create one-time fixed views into the heap

65a5417

We can remove the float-by-float JS copy and replace with this simple TypedArray set() calls.

Allow the number of channels to increase (or the audio chain to change)

836dafc

Typed views are recreated if needed but otherwise are reused.

Work in progress, moved the output buffers first

95b76d1

Lots of juggling with the various pointers, and next will be to reduce the code and move all of the output first to stop repeating some of the calculations. Some can also move to the constructor.

Interim commit, work-in-progress

3eb5198

Work-in-progress: using a single stack allocation

54425c4

The code has also been brought back closer to the original for comparison.

WIP: notes and findings

86ca75c

The initial stackAlloc() is overflowing, seeming to need more space so this is accounted for.

Correct stack offsets and verified code

c811489

Tested with various stack sizes, output sizes, and generators.

Added more assertions, minor docs

389925d

The assertions should now cover all cases of changes in address and size of the output views.

Explicitly assert any changes to the stack address

f80f09d

Added sample files

d653a0a

Work-in-progress

79e1a95

(Off home!)

Initial mixer

4c0954c

Rough implementation to see what needs doing in JS vs Wasm.

Missing blank line

3ffbcc9

Work-in-progress (reusable audio creation and playback)

06a99f0

Tidied mixer

bf0828c

Typo

fc40bcb

Added test harness hooks

e53389c

Added description of the test

3f1aae4

Added the web audio mixer to the browser tests

5e12e01

STRICT will fail without a filled INCOMING_MODULE_JS_API

7175a06

Added two audio ins to two audio outs test

fb4447f

Added the mono tests

43e569d

Formatting

5396b63

Fixes to build with MEMORY64

6b1a238

The tests pass the audio context in a void* for convenience, which needs shortening/widening for 64-bit pointers.

cwoffenden force-pushed the cw-audio-tweaks-3 branch from 807f3e6 to 6b1a238 Compare November 16, 2024 01:29

cwoffenden requested review from juj and sbc100 November 16, 2024 01:42

sbc100 reviewed Nov 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AUDIO_WORKLET] Optimise the copy back from wasm's heap to JS #22753

[AUDIO_WORKLET] Optimise the copy back from wasm's heap to JS #22753

cwoffenden commented Oct 16, 2024 •

edited

Loading

juj commented Oct 17, 2024

cwoffenden commented Oct 17, 2024

cwoffenden commented Oct 21, 2024 •

edited

Loading

cwoffenden commented Oct 28, 2024

cwoffenden commented Oct 29, 2024 •

edited

Loading

cwoffenden commented Oct 30, 2024 •

edited

Loading

sbc100 Nov 19, 2024

cwoffenden Nov 19, 2024

sbc100 Nov 19, 2024

sbc100 Nov 19, 2024

cwoffenden Nov 19, 2024

sbc100 Nov 19, 2024

cwoffenden Nov 19, 2024

sbc100 Nov 19, 2024

sbc100 Nov 19, 2024

cwoffenden Nov 19, 2024

[AUDIO_WORKLET] Optimise the copy back from wasm's heap to JS #22753

Are you sure you want to change the base?

[AUDIO_WORKLET] Optimise the copy back from wasm's heap to JS #22753

Conversation

cwoffenden commented Oct 16, 2024 • edited Loading

juj commented Oct 17, 2024

cwoffenden commented Oct 17, 2024

cwoffenden commented Oct 21, 2024 • edited Loading

cwoffenden commented Oct 28, 2024

cwoffenden commented Oct 29, 2024 • edited Loading

cwoffenden commented Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cwoffenden commented Oct 16, 2024 •

edited

Loading

cwoffenden commented Oct 21, 2024 •

edited

Loading

cwoffenden commented Oct 29, 2024 •

edited

Loading

cwoffenden commented Oct 30, 2024 •

edited

Loading