-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AUDIO_WORKLET] Optimise the copy back from wasm's heap to JS #22753
base: main
Are you sure you want to change the base?
Conversation
38203f0
to
24a90e4
Compare
Nice idea! It looks like this PR carries the rename from the other PR. Rebase/merge should hide it? |
A shower thought, as all good ideas are!
Yes, this one was rebased off the earlier one so should have the same commit IDs. |
d186eb5
to
d4680ca
Compare
EDIT: every test I tried suggests no, the stack is only used once we hit Wasm in the callback. @juj or someone who knows more about the internals of this than me: on entering To rephrase: unless the audio worklet explicitly uses the stack functions from JS, nothing external from Wasm will before |
dcdcea1
to
5b65dcf
Compare
Some notes: lots of experiments with the stack allocations, minimum sizes, various flags ( Next is to benchmark it. |
c98adcf
to
2373dd2
Compare
Benchmarks of the main part of the audio copy done: https://wip.numfum.com/cw/2024-10-29/index.html Testing on my M2 Mac Studio in Chrome and Safari this PR is around 15x faster on the float copy, e.g. the original being 0.625µs per
@juj if we're in agreement that the simplified standalone JavaScript test code is doing the right thing (it's short and a copy and paste), I can gather numbers from regular hardware (we have a wall of Chromebooks at work). Next I'll need to create tests to show that this still works with various input and output configs. EDIT: a 7-12x speed-up seems typical on x64 Windows or Linux. |
|
069f7a4
to
ae0e8bf
Compare
f6153e9
to
cccece4
Compare
b3dc2ef
to
ac37140
Compare
We can remove the float-by-float JS copy and replace with this simple TypedArray set() calls.
Typed views are recreated if needed but otherwise are reused.
Lots of juggling with the various pointers, and next will be to reduce the code and move all of the output first to stop repeating some of the calculations. Some can also move to the constructor.
The code has also been brought back closer to the original for comparison.
The initial stackAlloc() is overflowing, seeming to need more space so this is accounted for.
Tested with various stack sizes, output sizes, and generators.
The assertions should now cover all cases of changes in address and size of the output views.
(Off home!)
Rough implementation to see what needs doing in JS vs Wasm.
The tests pass the audio context in a void* for convenience, which needs shortening/widening for 64-bit pointers.
807f3e6
to
6b1a238
Compare
@@ -5382,7 +5382,7 @@ def test_full_js_library_strict(self): | |||
'minimal_runtime_pthreads_and_closure': (['-sMINIMAL_RUNTIME', '-pthread', '--closure', '1', '-Oz'],), | |||
'pthreads_es6': (['-pthread', '-sPTHREAD_POOL_SIZE=2', '-sEXPORT_ES6'],), | |||
'es6': (['-sEXPORT_ES6'],), | |||
'strict': (['-sSTRICT'],), | |||
'strict': (['-sSTRICT', '-sINCOMING_MODULE_JS_API=canvas,instantiateWasm,monitorRunDependencies,onAbort,onExit,print,setStatus,wasm,wasmMemory'],), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change needed for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was running the audio tests with -sSTRICT
and couldn't understand how any of them ran without it. They build, they don't run, e.g., compile any example with STRICT
and run it:
Uncaught RuntimeError: Aborted(`Module.wasmMemory` was supplied but `wasmMemory` not included in INCOMING_MODULE_JS_API)
at abort (index.js:592:11)
at ignoredModuleProp (index.js:823:5)
at checkIncomingModuleAPI (index.js:1548:3)
at index.js:223:1
It's for sure separate, so can remove it and come back to the tests later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If wasmMemory is requires by for AUDIO_WORKLET then its should be added explicitly in link.py, so that its always part of INCOMING_MODULE_JS_API
(even in strict mode).
# We can't test playback, so don't need the data files, we're testing it builds and runs | ||
@parameterized({ | ||
'': ([],), | ||
'minimal_with_closure': (['-sMINIMAL_RUNTIME', '--closure', '1', '-Oz'],), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We normally write this --closure=1
as a single arg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied an existing test for this, I'll change it tomorrow.
|
||
// Toggles the play/pause of a MediaElementAudioSourceNode given its ID | ||
EM_JS(void, toggleTrack, (EMSCRIPTEN_WEBAUDIO_T srcID), { | ||
var source = emscriptenGetAudioObject(srcID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use 2-space indentation for all these new tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll change them tomorrow. Then take a shower afterwards.
os.mkdir('audio_files') | ||
shutil.copy(test_file('webaudio/audio_files/emscripten-beat-mono.mp3'), 'audio_files/') | ||
shutil.copy(test_file('webaudio/audio_files/emscripten-bass-mono.mp3'), 'audio_files/') | ||
self.btest('webaudio/audioworklet_2x_in_hard_pan.c', expected='0', args=['-sAUDIO_WORKLET', '-sWASM_WORKERS']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use btest_exit
and remove the expected
arg?
// And that the views' size match the passed in output buffers | ||
for (i of outputList) { | ||
for (j of i) { | ||
console.assert(j.byteLength == bytesPerChannel, `AudioWorklet unexpected output buffer size (expected ${bytesPerChannel} got ${j.byteLength})`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use the emscripten assert
function here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert()
isn't available to the worklet. I could look into why, I think it might just need exposing on the module then a adding to the AudioWorkletGlobalScope
. But I think I already looked at this last month.
This builds on #22741 just because that's where I was at, but it's not required. The interesting changes are in
audio_worklet.js
and I'd appreciate some feedback from @juj before tidying this up (with sanity checks and a fallback).Since we pass in the stack for the worklet from the caller's heap, its address shouldn't change. And since the (I'll make myself say it) render quantum size doesn't change after the audio worklet creation, the stack positions for the audio buffers should not change either. So, we can create one-time subarray views and replace the float-by-float copy with a simple
set()
per channel (per output).I've thrown simple tests at it at and it works, fulfilling the garbage-free requirement and theoretically having a nice performance boost (not measured, but looping over thousands of JS Number types and shuffling them to and from floats must come at a cost). If the
outputList
does change, then it should only change after changes to the audio chain, which would be expensive enough that changing the subarrays wouldn't make a difference.To be extra sure, we can move the output buffers to the first entries on the stack, then simple additional changes like input buffers won't change the address.
It wants sanity checks here and there but I'd like feedback for anything I'm missing or misunderstanding. Thanks!