Diffusion Model

In this project, a diffusion model is applied to artificially generate new images from a given dataset. The diffusion process is inspired by [1] and [2] and the code is based on the implementation of this notebook.

The model consists of a simplified UNet architecture with several convolutional blocks and residual connections. Instead of a full diffusion process, each image is assigned a random time step value that is passed to the model together with the image. A Positional Encoding layer embeds the time step into a vector of sines and cosines (as in the Transformer architecture [3]). During training, a batch of images is transformed into its noisy version, where the degree of noise is estimated using the assigned diffusion time stamp. The model then learns to predict the added noise levels of the input batch. The loss function compares the predicted noise levels with the true noise levels, while the gradients are used to minimize this difference. This strategy is computationally more effective than passing each image into a full diffusion process involving all time steps. During inference however, the model is given a random noise Tensor which runs through the whole diffusion process reversely: Starting with the last time step, the noise Tensor is passed as an image to the model and the predicted noise level is then substracted from the image. This is done interatively until the first timestep is reached, which represents a generated, noiseless image. The amount of noise added/reduced to the image is predefined by the betas vector $\beta_t$ and determines a linearly increasing amount of noise per time step.

As an exemplary dataset, an image dataset for crater detection on Mars and Moon surface is used [4]. The training set consists of $98$ images, which are resized to a dimensionality of $64 \times 64$ and normalized to a value range of $[-1,1]$.

The above images are generated after $500$ epochs with a batch size of $16$. Rough structures can be recognized, but they still lack sharp contours. A higher model complexity or number of epochs may improve the current outcome.

References

[1] J. Ho et al. (2020), "Denoising Diffusion Probabilistic Models", 34 Conference on Neural Information Processing Systems (NeurIPS 2020), Available: https://arxiv.org/abs/2006.11239

[2] Dhariwal and Nichol (2021), "Diffusion Models Beat GANs on Image Synthesis", Advances in Neural Information Processing Systems 34 (NeurIPS 2021), Available: https://arxiv.org/abs/2105.05233

[3] A. Vaswani et al. (2017), “Attention is all you need”, Advances in Neural Information Processing Systems 30 (NeurIPS 2017), Available: https://arxiv.org/abs/1706.03762

[4] https://www.kaggle.com/datasets/lincolnzh/martianlunar-crater-detection-dataset

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
README.md		README.md
model.py		model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diffusion Model

References

About

Releases

Packages

Languages

lscharwaechter/DiffusionModel

Folders and files

Latest commit

History

Repository files navigation

Diffusion Model

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages