Welcome back to Latent Space University! Today, we will delve deep into the fascinating world of image generation and learn how to leverage the DALL-E API to generate images from user prompts.
Image generation AI has progressed exponentially in the last 10 years:
And has become one of the earliest clearly monetizable usecases in the current AI summer:
The difference between a hobby and a paid product often boils down to personalization. People want to visualize their ideas and see themselves in the products they use, without having the skills or resources to make it themselves. This desire for personalization has fueled the success of companies like Midjourney, Lexica, PhotoAI, and PlaygroundAI.
OpenAI led the way in the modern image generation era with DALL·E 1 and 2 in 2021-22, impressing people with the iconic avocado chair:
and it is available as an API if you’re all-in on the OpenAI stack. However there is much more competition to OpenAI in the images domain compared to text, as we’ll cover below.
While Midjourney does not have an API (there is an open source clone), it is by far the most advanced and successful platform in this space. Midjourney v5 was responsible for the viral Balenciaga Pope image in March 2023.