Want to learn Stable Diffusion? This beginner’s guide is for newbies with zero experience with Stable Diffusion or other AI image generators. You will get an overview of Stable Diffusion and some basic useful tips. Try out the Stable Diffusion AI right on this page. It’s the best way to learn!
There are similar text-to-image generation services like DALLE and MidJourney. Why Stable Diffusion? The advantages of Stable Diffusion are
- Open-source: Many enthusiasts have created free tools and models.
- Designed for low-power computers: It’s free or cheap to run.
Stable Diffusion is free to use when running on your own Windows, Linux or Mac machines. An online service will likely cost you a modest fee because someone need to provide you the hardware to run on. Stable Diffusion is technically free to use, with some caveats:
- The base Stable Diffusion model and weights are open source and free to download. This means you can run the model locally on your own GPU/CPU.
- However, most people use paid services like DreamStudio, Neural Blender, Latent Diffusion, etc. that have the full model already installed and can generate images for you. These services often have free tiers but also have paid subscriptions.
- Large image generations require GPUs, which are expensive. So to get the full benefit of Stable Diffusion, you'll likely have to pay for a Dedicated GPU Server instance or a service that provides GPUs.
- Stable Diffusion was trained on a massive dataset that included both copyrighted and non-copyrighted images. There are ethical and legal debates around using the model to generate images from copyrighted works.
- While the model itself is open source, much of the research, techniques and datasets that went into Stable Diffusion were developed by for-profit companies.
For absolute beginners, I recommend using the free online generator above or other online services. You can start generating without the hassle of setting things up.
The downside of free online generators is that the functionalities are pretty limited. Use a more advanced GUI (Graphical User Interface) if you’ve outgrown them.
AUTOMATIC1111 is a popular choice. See the Quick Start Guide for setting up the Google Colab cloud server. Running it on your PC is also a good option if you have the right PC. See install guides for Windows and Linux.
The best way to understand Stable Diffusion is to try it out yourself. Try the Stable Diffusion image generator below. Here are the four easy steps.
1.Close your eyes. Imagine an image you want to make.
2.Describe the image in words as much detail as possible. (For the best result, be sure to cover the subject and the background and use a lot of describing words)
3.Write it in the prompt input box below.
4.You can leave the negative prompt [low contrast, underexposed, overexposed, ugly, disfigured, deformed] unchanged.
5.Select your like Stable Diffusion checkpoint
Stable Diffusion turns this prompt into images like the ones below. You can generate as many variations as you want from the same prompt.
Here’s a list of simple examples of prompts you can try. Switch the model to see the effect.
a cute Siberian cat running on a beach
a cyborg in style of van Gogh
french-bulldog warrior on a field, digital art, attractive, beautiful, intricate details, detailed face, hyper-detailed closed eyes,zorro eye mask, artstation, ambient light
We should always generate multiple images when testing a prompt. I generate 2-4 images at a time when making big changes to the prompt so that I can speed up the search. I would generate 4 at a time when making small changes to increase the chance of seeing something usable. Some prompt only works half of the time or less. So don’t write off a prompt based on one image.
You can use Stable Diffusion to generate photo-style realistic people. Let’s see some samples.
In this section, you will learn how to build a high-quality prompt for realistic photo styles step-by-step. Let’s start with a simple prompt of a woman sitting outside of a restaurant. Model: Stable Diffusion v1.5.
Prompt:
photo of young woman, highlight hair, sitting outside restaurant, wearing dress
Negative prompt:
disfigured, ugly, bad, immature, cartoon, anime, 3d, painting, b&w
Animals are popular subjects among Stable Diffusion users. Here are a sample. Model: Dreamshaper v7
Prompt:
National Geographic Wildlife photo of the year, elephant trunk pointing up in new york city, night, dark studio, depth of field, trunk pointing up
Negative prompt:
deformed, disfigured, underexposed, overexposed
Cartoons are also popular subjects among Stable Diffusion users. Here are a sample. Model: Ghostmix
Prompt:
french-bulldog warrior on a field, digital art, attractive, beautiful, intricate details, detailed face, hyper-detailed closed eyes,zorro eye mask, artstation, ambient light
Negative prompt:
deformed, disfigured, underexposed, overexposed
Stable Diffusion’s native resolution is 512×512 pixels for v1 models. You should NOT generate images with width and height that deviates too much from 512 pixels. Use the following size settings to generate the initial image.
- Landscape image: Set the height to 512 pixels. Set the width to higher, e.g. 768 pixels (2:3 aspect ratio)
- Portrait image: Set the width to 512 pixels. Set the height to higher, e.g. 768 pixels (3:2 aspect ratio)
If you set the initial width and height too high, you will see duplicate subjects. The next step is to upscale the image. The free AUTOMATIC1111 GUI comes with some popular AI upscalers.