Stable Diffusion with AMD GPU on Windows
Image Generation with AMD Graphics Cards
Harnessing the Power of Stable Diffusion on AMD GPUs: A Personal Journey and Guide
As a powerful tool in AI-generated art, Stable Diffusion enables artists and enthusiasts alike to transform ideas into visual masterpieces with unprecedented ease. Creating high-quality images on your own computer is becoming increasingly easier. A decent understanding of how to operate a PC is enough to get it up and running. You will need to have some good hardware though. Let's discuss how it's done!
Here's the system we used to generate the images shown on this site:
AMD 5950x CPU, 64 GB RAM, AMD RX 6800 XT GPU, Windows 10.
Why is this guide about AMD GPU's in particular?
While AMD GPUs are known for their high performance and cost-effectiveness, they have historically lacked support for stable diffusion, especially on Windows. For example the AMD Radeon RX RX 6800 XT is generally considered to have impressive performance and features for AI-driven artistic endeavors.
Thanks to recent developments and community contributions, stable diffusion now works on AMD GPUs, especially when using Microsoft Olive and the DirectML platform API.
As of december of 2023, we have so far not found great examples of working LLM text generation on AMD GPU's. This will likely be fixed in the future.
Can I use a CPU instead of a GPU?
Using a CPU is possible; you can run Stable Diffusion in CPU mode. It is very slow though and not very practical, even when using high end CPUs. Using our setup(!) it took about 10-15 times longer compared to our relatively fast CPU.
Even though the AMD 6800 XT has 16 GB of VRAM, it quickly becomes clear that it has limits. AMD RX 7900 XTX is faster and has got 8 more GB of VRAM.
Guide and details
1. First off, we suggest uninstalling any NVIDIA drivers you might have. There might exists workarounds by using commandline args (more on that later), if you really want to keep your drivers, like using:
--backend directml
2. Follow these instructions through step 2 on installing Python, adding it to the system's PATH, installing Git and downloading stable-diffusion-webui: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs
3. Once you've got the 'stable-diffusion-webui-directml' folder created, come back here and skip their step 3.
4. Instead of just double-clicking 'webui-user.bat' as their guide suggests, create your own bat file in the same folder. You can name it for example "webui-settings.bat". Open this file using Notepad or a similar text editor and then paste the following code inside. Afterward, save the file.
The code:
@echo off
set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS= --medvram --backend directml --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check --disable-model-loading-ram-optimization
call webui.bat
This will unsure that the webui.bat is started with COMMANDLINE_ARGS that you'll probably want to have. We had to test many combinations before it worked. Also, some of them are only needed for some of the models used. We'll talk more about models soon.
Further explanations of the command line args above and more
You can find more args here: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Command-Line-Arguments-and-Settings
Commandline args that we use:
--medvram: This stands for "medium VRAM". It's a setting that helps Stable Diffusion run on a computer with a moderate amount of video memory (VRAM). VRAM is like dedicated memory for graphics and visual tasks. This setting ensures that the program doesn't try to use more memory than your computer can handle.
--backend directml: The 'backend' refers to the part of the software that does the heavy lifting, especially in terms of processing data. 'DirectML' is a type of backend designed to work well with Windows and AMD GPUs. It's like choosing a specific tool that works best with your computer's hardware for the task of generating images.
--precision full: This sets the program to use 'full precision' in its calculations. In simpler terms, it tells Stable Diffusion to be as accurate as possible when creating images, which can lead to better-quality results but might require more processing power.
--no-half: This argument tells the program not to use 'half precision'. Half precision is a way of doing calculations that's less accurate but faster and uses less memory. By saying 'no-half', you're choosing accuracy over speed.
--no-half-vae: VAE stands for Variational Autoencoder, a part of the model that helps in generating images. Like the previous argument, '--no-half-vae' means not to use half precision for this specific part of the model, again prioritizing accuracy.
--opt-split-attention-v1: This is a bit technical, but it's basically an optimization (improvement) for how the AI pays attention to different parts of the image it's generating. It's like fine-tuning the AI's focus to create better images.
--opt-sub-quad-attention: Another optimization setting, this one changes how the AI model processes parts of the image. It's another tweak to make the image generation more efficient.
--disable-nan-check: 'Nan' stands for 'Not a Number'. This argument disables checks for these 'Nan' values during calculations. It's a bit like telling the program not to worry about certain types of errors that can occur during the image generation process.
--disable-model-loading-ram-optimization: This turns off certain optimizations related to how the AI model is loaded into your computer's RAM (Random Access Memory). RAM is like short-term memory for your computer. Disabling this optimization might make the program use more RAM, but it could help avoid specific issues or improve performance depending on your system.
Head back to the guide at https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs. Whenever it says to use webui.bat, just use your custom bat file instead to keep those COMMANDLINE_ARGS in play.
Now let's start your webui-settings.bat, if you haven't already, by double-clicking it. The program begins by opening a command prompt. Once loaded, it will also open a web browser window. If it doesn't, the local IP address will be shown in the command prompt, something like http://127.0.0.1:7860
Still having issues? This might help you: https://stable-diffusion-art.com/install-windows
Now you need to download models (the .safetensors file) from sites like huggingface.co.
We recommend these models:
dreamlike-photoreal-2-0 huggingface.co/dreamlike-art/dreamlike-photoreal-2.0
dreamshaper_8 huggingface.co/Lykon/dreamshaper-8
icbinp-afterburn (our favorite since it's a little faster and gives a litter better results at for example 16-18 steps) huggingface.co/dibdab007/icbinp-afterburn
JuggernautXL civitai.com/models/133005/juggernaut-xl
You put these inside this folder:
stable-diffusion-webui-directml\models\Stable-diffusion
Go to webui, choose your model and start generating. You'll find a lot of settings when using the txt2img. Here are our recommendations that you can start with:
We recommend these Sampling methods:
DPM++ 2M Karras
Euler a
Note that some models work best with specific samplers.
Sampling steps:
This depends very much on the model. 16-25 for most models. For SDXL: 1-4.
Width: 512-1024 (higher if you have a GPU better than 6800 XT)
Height: 512-1024 (higher if you have a GPU better than 6800 XT)
Remember that models are often trained for a certain sizes and may work better using those.
CFG ( Classifier-Free Guidance Scale): Scale 6-7 for most models we have tested. Set to 1 for SDXL.
Use LDSR when upscaling images for improved image quality. We found that when you need to fix eyes or face these are great: Lanczos, ESRGAN_4x and R-ESRGAN 4x+ for photorealistic images.
Text to Image Keywords Explained, Simplified
Text to Image Generation Simplified
This is the core function of Stable Diffusion. You provide a text description (like "a sunny beach with palm trees"), and the model uses this text to create a corresponding image. It's like turning words into pictures, using AI. This process is fascinating because the AI has to understand language, interpret your description creatively, and then use its 'knowledge' (from being trained on thousands of images and texts) to generate a new image. It's like having a robotic artist at your command.
Stable Diffusion Checkpoint Model
To understand this, imagine teaching a student (in this case, the AI model) over several weeks. At the end of each week, you assess what the student has learned. Each assessment is like a 'checkpoint'. In AI, these checkpoints are moments when the model's learning is saved. They are crucial because they capture the AI's ability at that point. If the model learns something wrong later, you can go back to a checkpoint where it was doing well. In Stable Diffusion, these checkpoints represent the stages of learning where the model became good at turning text into images.
Prompt
A prompt is simply what you tell the AI to create. It's like giving a painting idea to an artist. You might say, "paint a red apple on a table" - that's a prompt. In Stable Diffusion, your prompts need to be descriptive enough for the AI to understand what you want. The better and clearer your description, the closer the resulting image will be to what you're imagining.
Sampling Method
This concept can be a bit trickier. Think of the AI as starting with a blank canvas full of random noise - just splotches of color everywhere. The sampling method is a set of rules the AI follows to gradually make sense of this chaos, transforming it step-by-step into a clear image that matches your prompt. It's like guiding the AI's brush strokes, starting from a rough sketch and refining it into a detailed painting.
Sampling Method, Sampling Steps, CFG Scale, Batch Count, Batch Size, Seed, and Script
Sampling Method: In the context of machine learning, the sampling method refers to the technique used to select a subset of data from a larger dataset to train or test a model. Common sampling methods include random sampling, stratified sampling, and oversampling
Sampling Steps:
Imagine the AI creating an image is like someone drawing a picture, but instead of one continuous process, they make the drawing in several steps. Each 'step' is a moment where the AI makes decisions on how to refine the image from a random collection of pixels into something that looks more like the final picture. The more steps it takes, the more refined the image becomes. It's like sketching a rough outline first, then adding details, and finally coloring and shading.
CFG Scale in Stable Diffusion:
CFG stands for "Classifier-Free Guidance Scale". This is a setting in the Stable Diffusion model that helps control how closely the image it creates follows your description. If you set this scale higher, the image will closely match what you described. If you set it lower, the AI has more freedom to be creative and might make something that's a bit different from your description but still related. It's like telling the artist (in this case, the AI) how closely to follow your instructions - either to stick closely to your idea or to add their own creative twist
Finding the Right Balance: The CFG Scale is important because it helps find the right balance between making exactly what you asked for and adding a bit of creativity. There isn't a one-size-fits-all setting; it can change depending on what you want from the image. Some people might prefer an image that sticks closely to their description (a higher CFG Scale), while others might want something more artistic and less literal (a lower CFG Scale)
Batch Count and Batch Size:
These terms are about how the AI handles data during training. Think of 'batch size' as the number of examples the AI looks at one time while learning. If you're teaching someone to recognize animals, you might show them several pictures at once - that's your batch. 'Batch count' is the total number of these sets you have. If you have 100 pictures and show 10 at a time, you have 10 batches. In AI, using batches helps the model learn more efficiently.
Seed:
In AI, a seed is a starting point for generating random numbers, which are used in processes like creating initial images. It's like the initial shuffle of a deck of cards. If you use the same seed, you'll get the same sequence of 'shuffles', making your AI's behavior predictable and repeatable. This is helpful when you want to recreate the same conditions or compare different models.
Script:
A script is like a recipe or a set of instructions for the computer. It tells the AI what to do, step by step. For example, a script can instruct the AI on how to start creating an image, how to apply the sampling steps, and what to do with the final output. It's a way of automating the AI's tasks, making it easier and more consistent to use.
Getting started writing prompts
Creating effective prompts for Stable Diffusion, an AI image generation model, involves a mix of specificity, creativity, and understanding of how the AI interprets language. Here's a guide to help you craft better prompts:
Be Specific and Detailed: The more detailed your prompt, the more accurately the AI can generate your desired image. Include specifics about the subject, setting, mood, colors, and style.
Use Descriptive Language: Vivid and descriptive language helps in painting a clear picture. Use adjectives and adverbs to add depth to your prompt.
Consider Styles and Techniques: If you want your image in a particular artistic style, mention it. For instance, "in the style of Impressionism" or "resembling a digital painting".
Balance Between Detailed and Open-Ended: While details are important, leaving some elements open-ended can lead to creative and surprising results.
Understand the Limitations: Stable Diffusion might not accurately render very complex or abstract concepts. Simplify where necessary.
Iterate and Refine: Your first prompt might not produce the perfect result. Use the outputs to refine your prompt, adding or removing details as needed.
Avoid Ambiguity: Ambiguous terms or phrases can lead to unexpected results. Be clear about what you want.
Incorporate Mood and Atmosphere: Describing the mood or atmosphere can significantly impact the outcome. Words like "mystical", "serene", or "chaotic" set a tone.
Use References Wisely: Referring to well-known artworks, historical periods, or famous landmarks can help anchor your image in a specific context.
Negative prompt: Negative prompts are used to exclude certain elements from the generated image. They should be specific and concise to effectively guide the AI. For example the word "blurry".
Keep Up with Community Trends: Engage with the Stable Diffusion community to learn from others' experiences and discover what types of prompts yield the best results.
Remember, crafting prompts is an art in itself, and practice will enhance your skill. Experiment with different styles, themes, and descriptions to see how the AI responds.
A great prompt for Stable Diffusion could be:
"Imagine a tranquil Japanese garden at sunset. The garden is filled with blooming cherry blossoms, their pink petals gently falling onto a serene koi pond in the center. The pond reflects the vibrant colors of the setting sun, casting a warm, golden glow. A traditional wooden Japanese bridge arches over the pond, and in the background, a majestic Mount Fuji is silhouetted against a sky painted with hues of orange, purple, and red. The atmosphere is peaceful and magical, capturing the essence of a perfect spring evening in Japan. The image should be rendered in a detailed, realistic style, emphasizing the beauty and harmony of nature."
This prompt is detailed, setting a specific scene and mood, and it incorporates elements like location, time of day, key features, and artistic style to guide the AI in generating a specific and beautiful image.
Models used with this prompt, from left to right: dreamlike-photoreal-2.0, icbinp-afterburn and dreamshaper_8
Vikings didn't have horns on their helms now did they? Unfortunately we forgot to add "horn, horns" to the negative prompt. However, we can still alter the image with "inpaint" as seen below.
Inpaint explained
The inpaint feature is valuable for fixing small defects in images, removing or replacing specific areas in an image, and repairing old photos. When you use the inpaint feature, you first tell the AI which part of the image you want to change. You can do this by marking or 'masking' that area.
Once you've marked the area to be changed, the AI steps in. It looks at the surrounding parts of the image to understand the context – the colors, textures, patterns, and even the 'mood' of the picture. Creating the Replacement: After understanding the context, the AI then generates new content that matches the surrounding area. If you're removing something, it fills in the gap seamlessly. If you're altering something, it makes the changes blend in naturally.
The Result: The end result is an image where the edited part doesn't look edited. It's as if the picture was always that way. This is particularly powerful because, unlike traditional editing tools that require manual effort and skill to blend edits naturally, the AI does most of the heavy lifting for you.
The prompt with inpaint
When using the inpaint feature in Stable Diffusion, the prompt you write plays a crucial role in guiding the AI to perform the desired edits. Here's how to craft an effective prompt for inpainting and the reasoning behind it:
Specify the Area to Edit: First, you need to indicate which part of the image you want to change. This is usually done by creating a mask or selecting the area in the image itself, rather than through the text prompt. The AI will focus only on this selected area for editing.
Describe the Desired Change: In your text prompt, clearly describe what you want the AI to do with the selected area. Be as specific as possible. For example, if there's an unwanted object, you might write, "Remove the trash can from the scene." If you're altering an element, your prompt could be, "Change the red car to a blue one."
Contextual Details: Include details about the surrounding area or the overall image to help the AI understand the context. For instance, if your image is a beach scene, you could add, "maintaining the sandy beach background."
Intended Outcome: If you have a specific outcome in mind, mention it. For example, "Edit the image as if the object was never there" or "Make the alteration look natural."
Why This Matters:
Precision: By being specific in your prompt, you help the AI understand exactly what changes you want. This increases the likelihood that the output will match your expectations.
Contextual Awareness: Including details about the surrounding area or overall theme of the image ensures that the AI edits blend seamlessly with the rest of the image. The AI uses this information to maintain consistency in style, lighting, and textures.
Creative Control: Detailed prompts give you more control over the creative process, allowing you to guide the AI in generating an image that aligns closely with your vision.
When crafting a prompt for inpainting, focus on clearly specifying the area to be edited, describing the desired change, adding contextual details, and outlining the intended outcome. This approach helps the AI understand your requirements better, leading to more accurate and pleasing results.
Graphics cards recommendations
We recommend the XFX Speedster MERC319 AMD Radeon RX 6800 XT as the budget option.
The RX 6800 XT, having reduced significantly in price since its release, is a good budget pick. 16 GB is very much needed when generating images.
We also recommend the PowerColor Red Devil AMD Radeon RX 7900 XTX Graphics Card for best performance.
The RX 7900 XTX costs more but is overall 50% faster than the RX 6800 XT and has 8 more GB of VRAM. If you want the current best, RX 7900 XTX is a great choice.
Mosely Viking-Themed Artwork: A Selection of Our Generated Images
One more thing! If you've made a good prompt but forgot how you did it, all information about the generation can be found inside the image at the beginning if you open it with a text editor.
Affiliate Disclosure: As an Amazon Associate, we earn from qualifying purchases.