Implement comparison method of Pfeifenberger et al 2017 #22

mim · 2017-05-22T18:31:15Z

Lukas Pfeifenberger, Matthias Zohrer, Franz Pernkopf. "DNN-BASED SPEECH MASK ESTIMATION FOR EIGENVECTOR BEAMFORMING." in ICASSP 2017. PDF

Slides from their talk at ICASSP

nateanl · 2017-06-06T17:08:57Z

I'm confused about "kernelized DNN". For each point in the spectrogram, there is a feature vector. But the kernels for different frequency bins are different. Does this mean I need to build 257 different autoencoder layers and merge the output together to feed into the regression layer?

mim · 2017-06-06T17:28:46Z

The slide numbered 14 (actually page 29) in the presentation shows a flowchart of the network structure. Does that answer your question?

mim · 2017-06-06T17:33:21Z

But yes, it looks like there is a separate small DNN for each frequency channel and their outputs are combined by the final regression layer.

nateanl · 2017-06-06T18:00:54Z

Ok, I see.

nateanl · 2017-07-11T22:50:21Z

One question, I tried to compute the PSD matrix of clean speech. According to CHiME3's explanation, the reference of the simulated set is in tr05_ORG. However, there are only single-channel audios.
In this case, what is the formula to compute the PSD matrix? Just repeat the spectrogram six times to get the 6*1 vector?

mim · 2017-07-12T01:20:29Z

If the power spectral density is supposed to be a 6x6 matrix per frequency, then you need to use the spatial image of the clean speech, not the original clean speech source signal. The spatial image of the clean speech is in the "reverberated" directory. If you need one power per frequency, then you can just average the speech power in the original clean speech source signals across time.

nateanl · 2017-09-03T00:21:58Z

I can't find the "reverberated" directory in /home/data/CHiME3...
Also in the paper, when the author calculated the ground truth, the formula is:

What is the meaning of "Tr"?

mim · 2017-09-03T00:45:21Z

Huh, that's strange, but I can confirm it is.gone. It looks like Felix Las modified the /home/data/CHiME3/data/audio directory at the end of June. I guess just downloaded it again from the chime3 website.

mim · 2017-09-03T00:45:48Z

Tr means trace of the matrix, the sum of the diagonal, which is also the sum of the eigenvalues.

nateanl · 2017-09-03T00:46:25Z

Got it.

mim · 2017-09-03T19:29:30Z

Wait, there is no reverberated directory for CHiME3, that's just CHiME2. CHiME3 has channel 0 as the reference.

nateanl · 2017-09-03T22:23:35Z

The PSD matrix of noise can be computed in this way. What about the PSD of speech? Does CHiME3 have 6 channels of speech audio?

mim · 2017-09-04T00:16:09Z

Yes, it is available for (some of?) the simulated mixtures. They are different between training, dev, and eval, so check each one. Also read the CHiME3 paper.

Equation (15) in the Pfeifenberger paper is just to show that it works in that visualization (figure 1), you don't actually need it for the deployable version of the algorithm.

When the spatial images (6-channel recordings) of the speech and noise are available separately, you can use those directly to compute the PSD of the speech and noise. For an observed mixture, there are several ways to estimate them.

nateanl self-assigned this May 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement comparison method of Pfeifenberger et al 2017 #22

Implement comparison method of Pfeifenberger et al 2017 #22

mim commented May 22, 2017

nateanl commented Jun 6, 2017

mim commented Jun 6, 2017

mim commented Jun 6, 2017

nateanl commented Jun 6, 2017

nateanl commented Jul 11, 2017

mim commented Jul 12, 2017

nateanl commented Sep 3, 2017

mim commented Sep 3, 2017

mim commented Sep 3, 2017

nateanl commented Sep 3, 2017

mim commented Sep 3, 2017

nateanl commented Sep 3, 2017

mim commented Sep 4, 2017

Implement comparison method of Pfeifenberger et al 2017 #22

Implement comparison method of Pfeifenberger et al 2017 #22

Comments

mim commented May 22, 2017

nateanl commented Jun 6, 2017

mim commented Jun 6, 2017

mim commented Jun 6, 2017

nateanl commented Jun 6, 2017

nateanl commented Jul 11, 2017

mim commented Jul 12, 2017

nateanl commented Sep 3, 2017

mim commented Sep 3, 2017

mim commented Sep 3, 2017

nateanl commented Sep 3, 2017

mim commented Sep 3, 2017

nateanl commented Sep 3, 2017

mim commented Sep 4, 2017