Skip to content

Unconditional Generation generates noise #82

Open
@reachomk

Description

@reachomk

Hi,

I'm training on a dataset of songs, and I was training with this package. After about 10 epochs (of 1000 samples each) the loss seems to converge, however after I sample I get pure noise. My intuition is even if the model is converging to a local minima, or I've not trained for enough time, it still should be producing some output (garbage in garbage out should still produce something other than pure noise). Thus I'm led to believe that there's an issue with the way I'm generating the audio. I've attached my code below.

Any suggestions, or anything more I need to provide?

def generate_samples(model, num_samples, sample_rate, audio_length_seconds, device):
    """
    Generate audio samples from the trained diffusion model.

    :param model: The trained diffusion model.
    :param num_samples: The number of audio samples to generate.
    :param sample_rate: The sample rate of the audio.
    :param audio_length_seconds: The length of the audio to generate, in seconds.
    :param device: The device ('cpu' or 'cuda') to run the sampling on.
    :return: A tensor containing the generated audio samples.
    """
    audio_length = sample_rate * audio_length_seconds
    # Initialize with random noise
    noise = torch.randn(num_samples, 1, audio_length, device=device)

    model.eval() 

    with torch.no_grad():  
        samples = model.sample(noise, num_steps=100)  

    return samples

waveform, sample_rate = torchaudio.load(audio_path)
# Example usage after training the model:
num_samples = 1  # Number of samples to generate
#sample_rate = dataset.sample_rate  # Sample rate of the audio
audio_length_seconds = 20  # Length of the audio to generate, in seconds
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Generate samples
generated_audio = generate_samples(model, num_samples, sample_rate, audio_length_seconds, device)

# Save the generated samples as FLAC files
for i, audio_tensor in enumerate(generated_audio):
    filename = f"generated_sample_{i+1}.flac"
    torchaudio.save(filename, audio_tensor.cpu(), sample_rate)
    print(f"Saved: {filename}")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions