Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSP to parse audio signal into MIDI sequence #3

Open
TurkeyMan opened this issue Jan 7, 2014 · 7 comments
Open

DSP to parse audio signal into MIDI sequence #3

TurkeyMan opened this issue Jan 7, 2014 · 7 comments
Labels

Comments

@TurkeyMan
Copy link
Member

Necessary to support vocals and 'pro' guitar.

  • Must be very low latency!
@p0nce
Copy link

p0nce commented Jan 8, 2014

Here a short term FFT analyzer.
https://github.com/p0nce/dplug/blob/master/dsp/dplug/dsp/fft.d

If I understand correctly you want blind separation of many sources mixed together. All I know is that for monophonic signals time-domain methods are faster, more accurate and with lower latency than FFT and for polyphonic signals it all break down and you have to go frequential, which brings quite a lot of latency.

Do you really need low latency? You might preprocess the songs.

@TurkeyMan
Copy link
Member Author

That helps! :)

I suspect lots of filtering/smoothing of the output will be required that will be fairly tricky to get accurate readings at very low latency.
Different voices, male/female, and picking up to 6 signals from a mixed guitar signal... these need to be made robust.

@p0nce
Copy link

p0nce commented Jan 9, 2014

OK (stop me if I'm wrong) the inputs are:

  • monophonic voice signal (a)
  • polyphonic guitar chords mixed together (b)

Desired output:

  • note onset / off
  • pitch

For (a), Autotune claim to use auto-correlation methods (very basically FFT of a FFT then peak detection) to detect pitch. There are rumors that it's actually time-domain, and in my experience you can have something like 10ms latency for typical material.
As for (b), Melodyne separates guitar chords, and it's an impressive tool for pitch, but I really don't know how they do it. You should ask on KVR Audio section DSP.

Note onset/offset is not that easy too since thresholds will inevitably be volume dependent.

@TurkeyMan
Copy link
Member Author

Sounds more or less right to me.
I have no idea how the polyphonic signal separation is done, but the vox one sounds about right.

10ms is probably okay. Frames are 16ms, and the UI layer draws later in the frame, so it can be afforded the better part of the frame (most time spent rendering the background scene).
I don't know how bad it would feel if visual response was a frame late... just one frame might be okay, but 2 is a lot. I can easily feel 2, and I'm personally pretty sensitive to even one frame latency.

It's a pretty involved piece of work. Hopefully someone more qualified than me steps forward to have a go at it! :)

@p0nce
Copy link

p0nce commented Jan 9, 2014

I will probably add a pitch detector to dplug, that I did for voice, I just need to port it from C++. It was meant to be secret but what the heck. It also works for monophonic harmonic signals like a single guitar chord but strangely not for pure sines.

Unfortunately the latency of the audio API (and buffer size) has a way higher impact then mere detection.
To have a simultaneous feel I had to make the audio host use ASIO and lower the buffer size to several ms.

@TurkeyMan
Copy link
Member Author

Yeah, I suspect some headache with the capture API's. We'll see how it goes when we get there.
I think the simpler instruments like drums will come first ;)

@p0nce
Copy link

p0nce commented Mar 22, 2014

https://github.com/p0nce/dplug/blob/master/dsp/dplug/dsp/goldrabiner.d

I've made a test program which output a WAV with pitch, voiced/unvoiced and a crude resynthesized output with volume = 1.
https://github.com/p0nce/dplug/blob/master/examples/pitch_detect/pitch_detect.d

The thing to get is that when there is no pitch (voicedness towards 0), the pitch output is wrong and shouldn't be used.

It can be used for monophonic voice and probably other instruments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants