Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Audio Library #15

Open
JohnDog3112 opened this issue Aug 20, 2024 · 27 comments · Fixed by #30, #37, #41 or #44 · May be fixed by #56
Open

Add Audio Library #15

JohnDog3112 opened this issue Aug 20, 2024 · 27 comments · Fixed by #30, #37, #41 or #44 · May be fixed by #56
Assignees
Labels
enhancement New feature or request Verify Fix This issue has been fixed, but please verify and close

Comments

@JohnDog3112
Copy link
Collaborator

Current framework:

Id=loadAudio(URL)
PlayAudio(id)
PlayAudio(id, volume)
PlayAudio(id, volume, pan)

PlayAudioRange(id, startSample, endSample)   // defaults to loop=FALSE, and sampleRate=file’s sample rate
PlayAudioRange(id, startSample, endSample, loop)  
PlayAudioRange(id, startSample, endSample, loop, sampleRate, volume, pan)   // sample rate

QueryAudioPlaybackPosition(id) // returns sample position playing right now
StopAudioPlayback(id);

FreeAudioID(id)

getAudioMetaData(id, struct metadata) // returns file format, sample rate, length, etc
getAudioSamples(id, start, mybuffer, mybuffersize)  // copy samples into c memory space

id=new()  // create an empty audio stream
AppendAudioSamples(id, mybuffer, mybuffersize, callback/event-when-audio-played)

Will likely use an AudioContext for the audio object from laodAudio and new(). AppendAudioSamples will then make new AudioBuffer nodes and append them to the AudioContext.

Since loadAudio and AppendAudioSamples both create Audio Nodes that can be chained into an AudioContext. This means it can either be assumed that they will be the only node and just have them create their own AudioContexts, or there could be a separate function for creating the AudioContext and functions to append new loadAudio, or AppendAudioSample nodes to that created context.

@JohnDog3112 JohnDog3112 self-assigned this Aug 20, 2024
@JohnDog3112 JohnDog3112 added the enhancement New feature or request label Aug 20, 2024
@JohnDog3112
Copy link
Collaborator Author

I have a working version of the following:

  • loadAudio
  • PlayAudio
  • QueryAudioPlaybackPosition
  • StopAudioPlayback
  • FreeAudioID
  • getAudioSamples
  • AppendAudioSamples

However, I am unsure about what is supposed to be done about the following:

  • GetAudioMetaData
    • Apparently getting the length and type of the audio can be a bit difficult. Sometimes length doesn't want to give anything other than NaN, though I could just be missing something. However, when I load external audio through the HTMLAudioElement, I haven't figured out any way to get the audio type. I might be able to query for the MIME type directly, but I'm not sure yet.
  • PlayAudioRange
    • I am not quite sure what this set of functions is supposed to do? Are they supposed to play multiple samples? Or is it supposed to be used in conjunction with AppendAudioSamples to play a subset of the samples in the object?
  • getAudioSamples
    • This works fine for audio objects created via AppendAudioSamples. However, getting audio data from an externally loaded audio file seems a bit trickier. I haven't found a way to directly get the data from the HTMLAudioElement, so a work around would be needed. The only method I can currently think of is loading the raw file data and decoding it, but that sounds like it is a bit overkill.

@twiddlingbits
Copy link
Owner

twiddlingbits commented Sep 2, 2024

Nice!

I didn't put a lot effort coming up with this API -- so feel free to adjust as needed.

GetAudioMetaData

If there are no useful meta data to be had, you can skip. But any kind of music playback or editing software will want to be able to get the sample rate and change it.

PlayAudioRange

Audio playback or editing software will often want to play or loop a range -- for example play from sample 5000 to sample 20000.

The only method I can currently think of is loading the raw file data and decoding it,

This might be the only solution for more serious use cases (like editing the audio). The "perfect sound" software i referenced before decodes its own audio files (which are IFF in the Amiga days. Today it would be .wav as the most common). An option would be to allow the API user to load a file (you might need an API for that), and then your API provides a function to decode .wav files (and get the meta data like length, sample rate, stereo/mono). You could keep it simple by just supporting uncompressed PCM .wav files. And leave it as an exercise for the API users to add more file formats if they need them (like IFF or MP3). You also need a function to encode the raw audio as well, for "saving". And I assume that JavaScript doesn't actually have anything like that anyway.

You might want to think of committing this library in two phases - first phase for simple playback cases like you would do with a game, and the second phase for the support needed for audio editors and music editors -- decoding, encoding .wav, looping, playing ranges, etc. (although i think a game would find looping and playing ranges useful).

@JohnDog3112
Copy link
Collaborator Author

It seems like audio encoding and decoding are much easier than I expected. While it doesn't seem like I can easily extract the raw data from an HTMLAudioElement, there is a builtin function (decodeAudioData) for creating an AudioBuffer (same object I use for AppendAudioSamples) from raw file data.

In addition, the MediaRecorder API seems like it would work for re-encoding the audio.

I'm going to go ahead and start migrating the loadAudio parts to use decodeAudioData instead since it makes the library simpler and removes the difference between loadAudio and AppendAudioSamples. The only downside is that it doesn't allow the audio to stream in, but I don't think it's a huge issue.

I think doing it like this solves most of the above issues, though, if media streaming is ever needed in the future, like from a microphone, this issue might come back up.

@JohnDog3112
Copy link
Collaborator Author

Another note, when I implemented everything, I skipped new(). Instead, I had loadAudio and audioFromSamples. However, now that loadAudio and audioFromSamples are both AudioBuffers, I was wondering if it would be better to add in a newAudioNode() function and then just have appendAudioSamples and appendAudioFile. Though, appending an audio file might be a bit difficult since my current implementation allows you to specify multiple channels.

On the note of channels, the current implementation of audioFromSamples takes in a number of channels, sample rate, audio buffer, and a length. audio buffer is treated as a 2D float array in format float[channels][length]. So, the length specifies the length per channel, not the length of the entire buffer. I was wondering if it would be better to possibly make a separate parameter for that or maybe make it so you only add data for 1 channel at a time. The only problem with adding it a channel at a time is that it seems like I need to recreate the audio buffer (and copy all of the data over) everytime I change the length, sample rate, or number of channels.

@JohnDog3112 JohnDog3112 linked a pull request Sep 20, 2024 that will close this issue
@JohnDog3112
Copy link
Collaborator Author

More TODO:

  • Add documentation
  • Combined with "Improve Pong Example": Add audio effects to Pong using this library

@twiddlingbits
Copy link
Owner

Some small comments:

twr_audio_play_file_full
I believe a more common name for this situation is twr_audio_play_file_ex (extended)

twr_convert_32_bit_pcm
I was assuming you would allow integer PCM to be played back directly (pass to twrAudioFromSamples), instead of adding a conversion step. What are the pros and cons?

@twiddlingbits
Copy link
Owner

twiddlingbits commented Sep 21, 2024

Please also add the audio unit tests to the Unit Test section of examples doc: /examples/examples-overview/

Also make a note that pong demos audio (when it does)

@JohnDog3112
Copy link
Collaborator Author

JohnDog3112 commented Sep 21, 2024

twr_convert_32_bit_pcm
The main advantage is that you get some level of type-checking. Using just twrAudioFromSamples, I think you would need a setup like this: twr_audio_from_samples(long channels, long sample_rate, void* data, long singleChannelDataLen, enum PCMType); Though, it could also be solved via different functions like twr_audio_from_samples_float, twr_audio_from_samples_16, etc.

Though, the main disadvantage of twr_convert_32_bit_pcm function is that it's going to double allocate the audio on the C side which might have a limited amount of memory.

As for twr_convert_play_file_ex, yeah, that makes more sense. I included the change in my most recent pull request.

@JohnDog3112 JohnDog3112 linked a pull request Sep 21, 2024 that will close this issue
@twiddlingbits
Copy link
Owner

the main disadvantage of twr_convert_32_bit_pcm function is that it's going to double allocate

Doesn't it also do an extra copy?

@twiddlingbits
Copy link
Owner

twiddlingbits commented Sep 22, 2024

bug: i tried the 2 player pong (AI mode), and i sometimes hear the beeps, but mostly i don't. Win 11, Chrome.

@twiddlingbits
Copy link
Owner

twiddlingbits commented Sep 22, 2024

Should these functions be passed a playback_id, not a node_id?

The docs say: When playing audio, a playback_id is returned to query or modify the playback.

void twr_audio_modify_playback_volume(long node_id, double volume);
void twr_audio_modify_playback_pan(long node_id, double pan);
void twr_audio_modify_playback_rate(long node_id, double sample_rate);

@twiddlingbits
Copy link
Owner

regarding pong. More testing reveals that the issue happens on bundled and unbundled, async and regular.

The issue is that a beep is heard when the ball goes out of bounds. But when the ball hits the paddle, there is no beep.

@twiddlingbits twiddlingbits added this to the 2.5.0 milestone Sep 24, 2024
@JohnDog3112
Copy link
Collaborator Author

Yeah, twr_convert_32_bit_pcm is going to double copy, so it might be better to just do it on the typescript side. Though, I don't know if it would be better to just include an enum specifying a type, or to create aliases like audio_from_8bit_pcm, audio_from_float_pcm, audio_from_16bit_pcm, etc.

I haven't been able to replicate the sound issue on either Firefox or Chrome. However, I'll try testing it on Windows when I can.

Yeah, the modify functions should take in a playback_id, it seems like I made a typo there.

@twiddlingbits
Copy link
Owner

I don't know if it would be better to just include an enum specifying a type, or to create aliases like audio_from_8bit_pcm, audio_from_float_pcm, audio_from_16bit_pcm

Your concern about type checking might be addressed with the latter (i'm not 100% sure if C would generate errors in this case but maybe). Either approach is okay. I would probably go with the latter (seperate names) - it seems very slightly simpler and clearer.

I haven't been able to replicate the sound issue on either Firefox or Chrome

We could do a zoom and I could demo the issue. I could also take a look at the code and see if i can find the issue.

@JohnDog3112
Copy link
Collaborator Author

From what I've seen C doesn't generate errors for the mismatches, but it often gives warnings, so it is still helpful to have them. I'll go ahead and implement the various function types then and do something similar for getting the audio data.

As for audio, a zoom call would be good for demoing the issue. If it needs live testing, I have a Windows dual boot on a computer that I could test it on, though I would have to install all the project dependencies. However, I'm still not quite sure what could be causing the issues. The paddles and out-of-bounds beeps are set the same way except for the frequency given to the square wave generator.

@JohnDog3112
Copy link
Collaborator Author

Also, as a minor note, how should conversion between something like 8-bit PCM and the float-based PCM javascript uses to be done? More specifically, negative numbers go 1 number further, so do you divide a (signed) char (for 8-bit) by just 127 or do you divide the positive side by 127 and the negative side by 128? I'm not sure it matters as much when you get to 16-bit or 32-bit PCM, but I feel like it does for 8-bit PCM.

@JohnDog3112
Copy link
Collaborator Author

Also, while implementing the 16bit PCM I noticed that there's no mem16 version of the WASM memory. Should that be added to the interface? It isn't strictly necessary, but it would be slightly more convenient and maybe(?) faster for converting 16-bit PCM to the float notation.

@twiddlingbits
Copy link
Owner

I can now consistently hear the audio. I think it's because i turned up the volume, and the sfx when the ball hits the pong paddle is very quite (compared to the sfx when the ball goes out of bounds). I suggest you make the paddle hit sft louder, so it is closer to the out of bounds sound.

@twiddlingbits
Copy link
Owner

I noticed that there's no mem16 version of the WASM memory. Should that be added to the interface?

Good idea. Please add that.

how should conversion between something like 8-bit PCM and the float-based PCM javascript uses to be done?

That's an interesting question, and I don't think there is a perfect answer. That said, i think you should divide by 128 (no matter if pos or neg). This will guarantee you get a number in the range 1 to -1, and will have a consistent scale. Dividing by 128 in one case and 127 in another case is not correct, because it gives negative and positive numbers a different scale. When they are converted to audio (by a digital to analog converter), -4 to -3 as well as 2 to 3 will both result in the same increase in voltage. Each quanta change maps to the same voltage change.

Another way to think about it is to consider how the numbers are created. In the example where you generate a sin wave mathematically (like you do in pong or the FFT example does), you will end up with a range of -127 to 127. Because you want it centered around zero, and you want it symmetrical. So the -128 is just not used. If you knew this was the case, you would divide by 127 to get a better SNR. But you don't know this is how the PCM data was generated.

Another example is to imagine the PCM data was generated by an 8 bit ADC (analog to digital converter). IIRC, they output an unsigned number between 0 and 255. And you then convert it to signed, by for example subtracting 128. This then gives you -128 to 127. So you haven't lost any precision. But, your zero point is screwy.

Its also interesting to note that floating point binary has a sign bit (which integers don't). So there is a + and a - zero. And the precision above and below zero is the same. So there is just a fundamental encoding difference between integers and floating point.

@JohnDog3112 JohnDog3112 linked a pull request Sep 26, 2024 that will close this issue
@twiddlingbits
Copy link
Owner

pong paddle hits are still very quite. And it also seems that sometimes it does make a loud buzz, but usually just a quite smack. I think we should do a zoom so i can share the screen and you can see if it works the same on windows as on your laptop.

@JohnDog3112
Copy link
Collaborator Author

Yeah, I feel like a zoom call is probably necessary. Both sounds have around the same volume on my end.

@JohnDog3112 JohnDog3112 linked a pull request Sep 28, 2024 that will close this issue
@twiddlingbits
Copy link
Owner

i'll remove the 2.5.0 milestone, but i am leaving this open:

I thought about the way you did the conversion from PCM to float:
channelBuff[i] = dataBuff[i] > 127 ? (dataBuff[i] - 256)/128 : dataBuff[i]/128;

If seems like it will be less computation and simpler to just expose signed views of the memory. What do you think about changing this after 2.5?

@twiddlingbits twiddlingbits removed this from the 2.5.0 milestone Sep 28, 2024
@twiddlingbits
Copy link
Owner

actually, now the single player pong beep play for too long -- it doesn't sound right. The two player sound a lot shorter. Is it the same length, but the audio is getting cut off?

Does the two player pong sfx sound too long for you?

@twiddlingbits
Copy link
Owner

okay the single player pong is fixed now.

@JohnDog3112
Copy link
Collaborator Author

i'll remove the 2.5.0 milestone, but i am leaving this open:

I thought about the way you did the conversion from PCM to float: channelBuff[i] = dataBuff[i] > 127 ? (dataBuff[i] - 256)/128 : dataBuff[i]/128;

If seems like it will be less computation and simpler to just expose signed views of the memory. What do you think about changing this after 2.5?

That sounds good. I just didn't consider adding signed versions of all the data types.

@twiddlingbits
Copy link
Owner

it always sort of bugged me that there were only unsigned array views. I think it will round things out to have signed versions as well.

@JohnDog3112 JohnDog3112 added the Verify Fix This issue has been fixed, but please verify and close label Oct 26, 2024
@JohnDog3112
Copy link
Collaborator Author

I've been doing some research on some WASM properties and after seeing SharedArrayBuffer's I looked more into using shared memory for audio buffers rather than copying. It seems like you still can't do that sort of thing with the AudioBuffer class I've been using. However, there is also a separate set of classes called AudioWorkletProcessor that allows you to create a custom node. It works by having a process() function that provides an object that you copy the next frame of audio data to. So, it should be possible to use something like a SharedArrayBuffer or a direct slice of WASM memory to be read into it. However, since it copies the data to the output, I'm wondering if it would be better to do something similar by simply creating a new AudioBuffer and copying the data from something like a SharedArrayBuffer every time audio is played. Both would do the same thing, but the second would probably be simpler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment