In this project we explore U-Nets and diffusion models in order to denoise, enhance, and generate images.
Skip to PART B
In this project we explore pretrained denoising models and use them to diffuse entirely new images using a few clever techniques.
These images seem to capture the task or prompt well. However, there are a few artifacts. Some of these include eyes and facial features not matching up. Because humans have very small features and patterns, it is hard for a model to create a realistic human. However, other objects such as a rocket ship and mountain village look great and show how the model does well with smooth objects with lower-frequency features.
An oil painting of a snowy mountain village
A man wearing a hat
A rocket ship
For this project, I will be using a Seed of 180 in PyTorch.
The forward process is defined by: $q(x_t | x_0) = N(x_t ; \sqrt{\bar\alpha_t} x_0, (1 - \bar\alpha_t)\mathbf{I})$. When computing our new images, we can compute the image with:
$$ x_t = \sqrt{\bar\alpha_t} x_0 + \sqrt{1 - \bar\alpha_t} \epsilon \quad \text{where}~ \epsilon \sim N(0, 1) $$
We denoise in iterative and skipped steps. Each step at a time t is defined by:
$$ x_{t'} = \frac{\sqrt{\bar\alpha_{t'}}\beta_t}{1 - \bar\alpha_t} x_0 + \frac{\sqrt{\alpha_t}(1 - \bar\alpha_{t'})}{1 - \bar\alpha_t} x_t + v_\sigma $$