-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Audio Library #15
Comments
I have a working version of the following:
However, I am unsure about what is supposed to be done about the following:
|
Nice! I didn't put a lot effort coming up with this API -- so feel free to adjust as needed.
If there are no useful meta data to be had, you can skip. But any kind of music playback or editing software will want to be able to get the sample rate and change it.
Audio playback or editing software will often want to play or loop a range -- for example play from sample 5000 to sample 20000.
This might be the only solution for more serious use cases (like editing the audio). The "perfect sound" software i referenced before decodes its own audio files (which are IFF in the Amiga days. Today it would be .wav as the most common). An option would be to allow the API user to load a file (you might need an API for that), and then your API provides a function to decode .wav files (and get the meta data like length, sample rate, stereo/mono). You could keep it simple by just supporting uncompressed PCM .wav files. And leave it as an exercise for the API users to add more file formats if they need them (like IFF or MP3). You also need a function to encode the raw audio as well, for "saving". And I assume that JavaScript doesn't actually have anything like that anyway. You might want to think of committing this library in two phases - first phase for simple playback cases like you would do with a game, and the second phase for the support needed for audio editors and music editors -- decoding, encoding .wav, looping, playing ranges, etc. (although i think a game would find looping and playing ranges useful). |
It seems like audio encoding and decoding are much easier than I expected. While it doesn't seem like I can easily extract the raw data from an HTMLAudioElement, there is a builtin function (decodeAudioData) for creating an AudioBuffer (same object I use for AppendAudioSamples) from raw file data. In addition, the MediaRecorder API seems like it would work for re-encoding the audio. I'm going to go ahead and start migrating the loadAudio parts to use decodeAudioData instead since it makes the library simpler and removes the difference between loadAudio and AppendAudioSamples. The only downside is that it doesn't allow the audio to stream in, but I don't think it's a huge issue. I think doing it like this solves most of the above issues, though, if media streaming is ever needed in the future, like from a microphone, this issue might come back up. |
Another note, when I implemented everything, I skipped new(). Instead, I had loadAudio and audioFromSamples. However, now that loadAudio and audioFromSamples are both AudioBuffers, I was wondering if it would be better to add in a newAudioNode() function and then just have appendAudioSamples and appendAudioFile. Though, appending an audio file might be a bit difficult since my current implementation allows you to specify multiple channels. On the note of channels, the current implementation of audioFromSamples takes in a number of channels, sample rate, audio buffer, and a length. audio buffer is treated as a 2D float array in format float[channels][length]. So, the length specifies the length per channel, not the length of the entire buffer. I was wondering if it would be better to possibly make a separate parameter for that or maybe make it so you only add data for 1 channel at a time. The only problem with adding it a channel at a time is that it seems like I need to recreate the audio buffer (and copy all of the data over) everytime I change the length, sample rate, or number of channels. |
More TODO:
|
Some small comments:
|
Please also add the audio unit tests to the Unit Test section of examples doc: /examples/examples-overview/ Also make a note that pong demos audio (when it does) |
twr_convert_32_bit_pcm Though, the main disadvantage of twr_convert_32_bit_pcm function is that it's going to double allocate the audio on the C side which might have a limited amount of memory. As for twr_convert_play_file_ex, yeah, that makes more sense. I included the change in my most recent pull request. |
Doesn't it also do an extra copy? |
bug: i tried the 2 player pong (AI mode), and i sometimes hear the beeps, but mostly i don't. Win 11, Chrome. |
Should these functions be passed a playback_id, not a node_id? The docs say: When playing audio, a
|
regarding pong. More testing reveals that the issue happens on bundled and unbundled, async and regular. The issue is that a beep is heard when the ball goes out of bounds. But when the ball hits the paddle, there is no beep. |
Yeah, twr_convert_32_bit_pcm is going to double copy, so it might be better to just do it on the typescript side. Though, I don't know if it would be better to just include an enum specifying a type, or to create aliases like audio_from_8bit_pcm, audio_from_float_pcm, audio_from_16bit_pcm, etc. I haven't been able to replicate the sound issue on either Firefox or Chrome. However, I'll try testing it on Windows when I can. Yeah, the modify functions should take in a playback_id, it seems like I made a typo there. |
Your concern about type checking might be addressed with the latter (i'm not 100% sure if C would generate errors in this case but maybe). Either approach is okay. I would probably go with the latter (seperate names) - it seems very slightly simpler and clearer.
We could do a zoom and I could demo the issue. I could also take a look at the code and see if i can find the issue. |
From what I've seen C doesn't generate errors for the mismatches, but it often gives warnings, so it is still helpful to have them. I'll go ahead and implement the various function types then and do something similar for getting the audio data. As for audio, a zoom call would be good for demoing the issue. If it needs live testing, I have a Windows dual boot on a computer that I could test it on, though I would have to install all the project dependencies. However, I'm still not quite sure what could be causing the issues. The paddles and out-of-bounds beeps are set the same way except for the frequency given to the square wave generator. |
Also, as a minor note, how should conversion between something like 8-bit PCM and the float-based PCM javascript uses to be done? More specifically, negative numbers go 1 number further, so do you divide a (signed) char (for 8-bit) by just 127 or do you divide the positive side by 127 and the negative side by 128? I'm not sure it matters as much when you get to 16-bit or 32-bit PCM, but I feel like it does for 8-bit PCM. |
Also, while implementing the 16bit PCM I noticed that there's no mem16 version of the WASM memory. Should that be added to the interface? It isn't strictly necessary, but it would be slightly more convenient and maybe(?) faster for converting 16-bit PCM to the float notation. |
I can now consistently hear the audio. I think it's because i turned up the volume, and the sfx when the ball hits the pong paddle is very quite (compared to the sfx when the ball goes out of bounds). I suggest you make the paddle hit sft louder, so it is closer to the out of bounds sound. |
Good idea. Please add that.
That's an interesting question, and I don't think there is a perfect answer. That said, i think you should divide by 128 (no matter if pos or neg). This will guarantee you get a number in the range 1 to -1, and will have a consistent scale. Dividing by 128 in one case and 127 in another case is not correct, because it gives negative and positive numbers a different scale. When they are converted to audio (by a digital to analog converter), -4 to -3 as well as 2 to 3 will both result in the same increase in voltage. Each quanta change maps to the same voltage change. Another way to think about it is to consider how the numbers are created. In the example where you generate a sin wave mathematically (like you do in pong or the FFT example does), you will end up with a range of -127 to 127. Because you want it centered around zero, and you want it symmetrical. So the -128 is just not used. If you knew this was the case, you would divide by 127 to get a better SNR. But you don't know this is how the PCM data was generated. Another example is to imagine the PCM data was generated by an 8 bit ADC (analog to digital converter). IIRC, they output an unsigned number between 0 and 255. And you then convert it to signed, by for example subtracting 128. This then gives you -128 to 127. So you haven't lost any precision. But, your zero point is screwy. Its also interesting to note that floating point binary has a sign bit (which integers don't). So there is a + and a - zero. And the precision above and below zero is the same. So there is just a fundamental encoding difference between integers and floating point. |
pong paddle hits are still very quite. And it also seems that sometimes it does make a loud buzz, but usually just a quite smack. I think we should do a zoom so i can share the screen and you can see if it works the same on windows as on your laptop. |
Yeah, I feel like a zoom call is probably necessary. Both sounds have around the same volume on my end. |
i'll remove the 2.5.0 milestone, but i am leaving this open: I thought about the way you did the conversion from PCM to float: If seems like it will be less computation and simpler to just expose signed views of the memory. What do you think about changing this after 2.5? |
actually, now the single player pong beep play for too long -- it doesn't sound right. The two player sound a lot shorter. Is it the same length, but the audio is getting cut off? Does the two player pong sfx sound too long for you? |
okay the single player pong is fixed now. |
That sounds good. I just didn't consider adding signed versions of all the data types. |
it always sort of bugged me that there were only unsigned array views. I think it will round things out to have signed versions as well. |
I've been doing some research on some WASM properties and after seeing SharedArrayBuffer's I looked more into using shared memory for audio buffers rather than copying. It seems like you still can't do that sort of thing with the AudioBuffer class I've been using. However, there is also a separate set of classes called AudioWorkletProcessor that allows you to create a custom node. It works by having a process() function that provides an object that you copy the next frame of audio data to. So, it should be possible to use something like a SharedArrayBuffer or a direct slice of WASM memory to be read into it. However, since it copies the data to the output, I'm wondering if it would be better to do something similar by simply creating a new AudioBuffer and copying the data from something like a SharedArrayBuffer every time audio is played. Both would do the same thing, but the second would probably be simpler. |
Current framework:
Will likely use an AudioContext for the audio object from laodAudio and new(). AppendAudioSamples will then make new AudioBuffer nodes and append them to the AudioContext.
Since loadAudio and AppendAudioSamples both create Audio Nodes that can be chained into an AudioContext. This means it can either be assumed that they will be the only node and just have them create their own AudioContexts, or there could be a separate function for creating the AudioContext and functions to append new loadAudio, or AppendAudioSample nodes to that created context.
The text was updated successfully, but these errors were encountered: