Latent Space University: Day 4 - Image Generation

Welcome back to Latent Space University! Today, we will delve deep into the fascinating world of image generation and learn how to leverage the DALL-E API to generate images from user prompts.

Screenshot 2023-09-01 at 6.07.34 PM.png

Business Case for Images

Image generation AI has progressed exponentially in the last 10 years:

Untitled

And has become one of the earliest clearly monetizable usecases in the current AI summer:

Lensa making $1m/day selling Dreamboothed personal avatars
Pieter Levels making $1m in 10 months with PhotoAI/InteriorAI/AvatarAI
RoomGPT getting 2m users in 6 months (it’s open source!)
Midjourney bootstrapping between 33M-300M in revenue with just 11 full time staff
- Largest discord server at over 15M people!

Why Pay for It?

The difference between a hobby and a paid product often boils down to personalization. People want to visualize their ideas and see themselves in the products they use, without having the skills or resources to make it themselves. This desire for personalization has fueled the success of companies like Midjourney, Lexica, PhotoAI, and PlaygroundAI.

Dall-E

OpenAI led the way in the modern image generation era with DALL·E 1 and 2 in 2021-22, impressing people with the iconic avocado chair:

Untitled

and it is available as an API if you’re all-in on the OpenAI stack. However there is much more competition to OpenAI in the images domain compared to text, as we’ll cover below.

Midjourney

While Midjourney does not have an API (there is an open source clone), it is by far the most advanced and successful platform in this space. Midjourney v5 was responsible for the viral Balenciaga Pope image in March 2023.