UNet: UNet is a type of convolutional neural network architecture that is commonly used for image segmentation tasks. In Stable Diffusion, a UNet is used as the generator network to transform the noise variable into an image that is compatible with the diffusion process. Specifically, the UNet takes as input the noise vector and generates an intermediate image, which is then transformed by a sequence of convolutional layers to produce the final image output. By using a UNet as the generator, Stable Diffusion can generate high-quality images with fine-grained details.
Scheduler: The diffusion process in Stable Diffusion involves a sequence of T steps, where each step updates the state of the data distribution. The update is performed by adding a scaled Gaussian noise term to the data distribution, where the scale is a function of the current step and the stable distribution parameters. The Scheduler component is used to adjust the scale of the noise term during the diffusion process. Specifically, the Scheduler gradually increases the scale of the noise term from a small value to a large value over the course of the diffusion process. This gradual increase in scale helps to stabilize the diffusion process and prevent numerical instability. The Scheduler is typically implemented as a function that maps the current step of the diffusion process to the corresponding noise scale.