Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unconditional Generation generates noise #82

Open
reachomk opened this issue Mar 5, 2024 · 1 comment
Open

Unconditional Generation generates noise #82

reachomk opened this issue Mar 5, 2024 · 1 comment

Comments

@reachomk
Copy link

reachomk commented Mar 5, 2024

Hi,

I'm training on a dataset of songs, and I was training with this package. After about 10 epochs (of 1000 samples each) the loss seems to converge, however after I sample I get pure noise. My intuition is even if the model is converging to a local minima, or I've not trained for enough time, it still should be producing some output (garbage in garbage out should still produce something other than pure noise). Thus I'm led to believe that there's an issue with the way I'm generating the audio. I've attached my code below.

Any suggestions, or anything more I need to provide?

def generate_samples(model, num_samples, sample_rate, audio_length_seconds, device):
    """
    Generate audio samples from the trained diffusion model.

    :param model: The trained diffusion model.
    :param num_samples: The number of audio samples to generate.
    :param sample_rate: The sample rate of the audio.
    :param audio_length_seconds: The length of the audio to generate, in seconds.
    :param device: The device ('cpu' or 'cuda') to run the sampling on.
    :return: A tensor containing the generated audio samples.
    """
    audio_length = sample_rate * audio_length_seconds
    # Initialize with random noise
    noise = torch.randn(num_samples, 1, audio_length, device=device)

    model.eval() 

    with torch.no_grad():  
        samples = model.sample(noise, num_steps=100)  

    return samples

waveform, sample_rate = torchaudio.load(audio_path)
# Example usage after training the model:
num_samples = 1  # Number of samples to generate
#sample_rate = dataset.sample_rate  # Sample rate of the audio
audio_length_seconds = 20  # Length of the audio to generate, in seconds
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Generate samples
generated_audio = generate_samples(model, num_samples, sample_rate, audio_length_seconds, device)

# Save the generated samples as FLAC files
for i, audio_tensor in enumerate(generated_audio):
    filename = f"generated_sample_{i+1}.flac"
    torchaudio.save(filename, audio_tensor.cpu(), sample_rate)
    print(f"Saved: {filename}")
@atharvagasheTAMU
Copy link

I have the same question. Can anyone please assist?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants