Bad amplitude normalization #21

Cortexelus · 2018-05-25T15:22:38Z

Problem

Amplitudes are min-max normalized, for each audio example loaded from the dataset.

Bad for three reasons:

First reason: DC offset. The normalization was calculated by subtracting the minimum and dividing by the maximum. But if minimum peak and maximum peak are different, silence is no longer the middle value, so you introduce a DC offset into the audio.

Second reason: Each example has different peaks, so each example will have a different quantization value for silence.

Third reason: dynamics. If part of my dataset is soft, part is loud, and part is transitions between soft and loud, they will all be normalized to loud. Now SampleRNN will struggle to learn those transitions. If some [8-second] example is nearly silent, now it is super loud.

I think the only acceptable amplitude normalization would be to the entire dataset and you could do so [with ffmpeg] when creating the dataset.

The normalization happens in linear_quantize

Audio normalized upon loading:

def __getitem__(self, index):
        (seq, _) = load(self.file_names[index], sr=None, mono=True)
        return torch.cat([
            torch.LongTensor(self.overlap_len) \
                 .fill_(utils.q_zero(self.q_levels)),
            utils.linear_quantize(
                torch.from_numpy(seq), self.q_levels
            )
        ])

(Example) linear_dequantize(linear_quantize(samples)) != samples

# quantize the wav amplitude into 256 levels
q_levels = 256
# Plot original wav samples
plot(samples)
# samples = tensor([ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000])

# Linearly quantize the samples
lq = linear_quantize(samples, q_levels)
plot(lq)
# lq = tensor([ 133,  133,  133,  ...,  133,  133,  133])
# note, silence should be 128

# Unquantize the samples
ldq = linear_dequantize(lq, q_levels)
plot(ldq)
# tensor([ 0.0391,  0.0391,  0.0391,  ...,  0.0391,  0.0391,  0.0391])
# introduction of DC offset. 
# instead, this should be silent 0.0000, 0,0000, 0.0000, ...

Solution

Don't normalize with linear_quantize

def linear_quantize(samples, q_levels):
    samples = samples.clone()
    samples += 1
    samples /= 2
    samples *= q_levels - EPSILON
    samples += EPSILON / 2
    return samples.long()

The text was updated successfully, but these errors were encountered:

Cortexelus · 2018-05-25T16:09:48Z

The original SampleRNN also has this issue, though the amplitude normalization happens per-batch
soroushmehr/sampleRNN_ICLR2017#24

StefOe · 2018-06-06T13:13:27Z

Interestingly, this became an issue for me when I introduced Batch Normalization to the Network. Thanks for the hint!

blibliki · 2020-03-08T21:17:54Z

Hey! @Cortexelus,
We applied the fix you proposed but keep facing DC Offset. What could be other places that could pinpoint such behavior?

Thanks in advance
See wekaco@0b8a434

dmtrs mentioned this issue Mar 6, 2020

DC offset wekaco/samplernn-pytorch#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad amplitude normalization #21

Bad amplitude normalization #21

Cortexelus commented May 25, 2018 •

edited

Loading

Cortexelus commented May 25, 2018 •

edited

Loading

StefOe commented Jun 6, 2018

blibliki commented Mar 8, 2020

Bad amplitude normalization #21

Bad amplitude normalization #21

Comments

Cortexelus commented May 25, 2018 • edited Loading

Problem

Audio normalized upon loading:

(Example) linear_dequantize(linear_quantize(samples)) != samples

Solution

Cortexelus commented May 25, 2018 • edited Loading

StefOe commented Jun 6, 2018

blibliki commented Mar 8, 2020

Cortexelus commented May 25, 2018 •

edited

Loading

Cortexelus commented May 25, 2018 •

edited

Loading