Unconditional model generates okay quality of fake human voice but failed on music. #80

piobmx · 2023-10-26T17:16:12Z

Hi, I've been playing with this diffusion model library for a few days, it is great to have such library that allows common users to train audio data with limited resources.

I have a problem with regard to the training data and the output. I fed the unconditional model with mozilla's common voice dataset. I used only one language and the size is about 15k. I resampled them to 44.1k and padded them to 2^18 samples per file if shorter. And the unconditional results were okay, at least I could tell it's human speaking although never actually audible.

But when I replace the training data with music (mostly pure pianos, same sample rate but 2^17 samples per input tensor), the model is not generating outputs that sounds like piano, in fact they are mostly noise.

I used the same configurations for each layers for both models, tried lowering the downsampling factors or increase attentions heads, but no significant difference. Any tips on why my problem happens?

piobmx · 2023-10-28T22:37:36Z

Weirdly, this gets kinda improved after I use just the default Adam optimizer with 1e-1 lr without any other configurations.

0417keito · 2023-12-05T13:29:14Z

Sorry for the sudden question. I would like to know about the value of the loss, how did the loss converge? What was the initial value of the loss and how did it evolve?

piobmx · 2023-12-05T14:53:48Z

Sorry for the sudden question. I would like to know about the value of the loss, how did the loss converge? What was the initial value of the loss and how did it evolve?

The initial value could depend on many factors but the loss is supposed to drops like this

0417keito · 2023-12-05T14:56:09Z

Thank you.

YuZongNB · 2024-07-10T15:07:48Z

Hi,Did you successfully generate piano music.？I'm also training with speech data and piano music, and I end up producing samples that are close to white noise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unconditional model generates okay quality of fake human voice but failed on music. #80

Unconditional model generates okay quality of fake human voice but failed on music. #80

piobmx commented Oct 26, 2023 •

edited

Loading

piobmx commented Oct 28, 2023

0417keito commented Dec 5, 2023

piobmx commented Dec 5, 2023

0417keito commented Dec 5, 2023

YuZongNB commented Jul 10, 2024

Unconditional model generates okay quality of fake human voice but failed on music. #80

Unconditional model generates okay quality of fake human voice but failed on music. #80

Comments

piobmx commented Oct 26, 2023 • edited Loading

piobmx commented Oct 28, 2023

0417keito commented Dec 5, 2023

piobmx commented Dec 5, 2023

0417keito commented Dec 5, 2023

YuZongNB commented Jul 10, 2024

piobmx commented Oct 26, 2023 •

edited

Loading