Skip to content

Files

Latest commit

ed4fb81 · Nov 23, 2020

History

History

SEGAN

About SEGAN

SEGAN stands for Speech Enhancement Generative Adversarial Network. This approach is a 1D adaptation of pix2pix. A detailed description of the generator, the discriminator and the training process is available here.

The original implementation of SEGAN with tensorflow is available here. The implementation used to reproduce SEGAN in this recipe was adapted from here.

Investigation

In this recipe, we investigate several aspects of SEGAN. The community has already reported that the noise that is added to the compressed representation of the signal is ignored by the generator in the reconstruction phase [1] [2]. We have been able to confirm and reproduce this result. Furthermore, we have investigated the impact of the adversarial loss over the L1 regularization. We found out that removing the adversarial part does not degrade performance and using SNR regularization instead of L1 regularization gives better performance.

Results

All the models were trained on LibriMix train-360 with task enh_single.

Generator loss SI-SNRi(dB) SI-SNR(dB) STOI STOIi PESQ PESQi
ConvTasnet 10.5 13.9 0.92 0.12 2.1 0.91
Adversarial + L1 4.2 7.66 0.83 0.03 1.37 0.21
L1 4.1 7.59 0.83 0.03 1.39 0.23
SNR 4.6 8.0 0.84 0.05 1.5 0.3