In this project we explore U-Nets and diffusion models in order to denoise, enhance, and generate images.

Skip to PART B

Project 5A

In this project we explore pretrained denoising models and use them to diffuse entirely new images using a few clever techniques.

Part 0: Setup

These images seem to capture the task or prompt well. However, there are a few artifacts. Some of these include eyes and facial features not matching up. Because humans have very small features and patterns, it is hard for a model to create a realistic human. However, other objects such as a rocket ship and mountain village look great and show how the model does well with smooth objects with lower-frequency features.

An oil painting of a snowy mountain village

image.png

A man wearing a hat

image.png

A rocket ship

image.png

For this project, I will be using a Seed of 180 in PyTorch.

Part 1: Sampling Loops

Part 1.1 Implementing the Forward Process

The forward process is defined by: $q(x_t | x_0) = N(x_t ; \sqrt{\bar\alpha_t} x_0, (1 - \bar\alpha_t)\mathbf{I})$. When computing our new images, we can compute the image with:

$$ x_t = \sqrt{\bar\alpha_t} x_0 + \sqrt{1 - \bar\alpha_t} \epsilon \quad \text{where}~ \epsilon \sim N(0, 1) $$

image.png

Part 1.2 Classical Denoising

image.png

Part 1.3 One-Step Denoising

image.png

1.4 Iterative Denoising

We denoise in iterative and skipped steps. Each step at a time t is defined by:

$$ x_{t'} = \frac{\sqrt{\bar\alpha_{t'}}\beta_t}{1 - \bar\alpha_t} x_0 + \frac{\sqrt{\alpha_t}(1 - \bar\alpha_{t'})}{1 - \bar\alpha_t} x_t + v_\sigma $$