videogames.ai Blog About Hardware guide
6 November 2022

Stable Diffusion on AMD GPUs using ROCm on Linux

by Mathieu Poliquin

I tested the following steps on my Asus Zephyrus G14 2022

Software/Hardware specs used:

Step 1: Installing ROCm (Ubuntu 22.04 / 22.10)

If you are using Ubuntu 22.10 you might need to skip the kernel mode driver installation and use the one already installed by specifying the –no-dkms flag

As of the is writing ROCM 5.3 is the latest version

sudo apt-get update
wget https://repo.radeon.com/amdgpu-install/5.3/ubuntu/jammy/amdgpu-install_5.3.50300-1_all.deb
sudo apt-get install ./amdgpu-install_5.3.50300-1_all.deb

sudo amdgpu-install --usecase=rocm,hip,mllib --no-dkms

sudo usermod -a -G video,render $LOGNAME

Note: You need to add your user to the render and video groups so you can access GPU resources

REBOOT

Step 2: Setup stable-diffusion

First install Conda that can get here: https://docs.anaconda.com/anaconda/install/linux/

sudo apt-get install libgl1-mesa-glx libegl1-mesa libxrandr2 libxrandr2 libxss1 libxcursor1 libxcomposite1 libasound2 libxi6 libxtst6
sudo chmod +x Anaconda3-2022.10-Linux-x86_64.sh
./Anaconda3-2022.10-Linux-x86_64.sh

Then sync the stable diffusion repo and create the conda environment which should download all necessary dependencies except for ones related to ROCm as this repo assumes you have a NVIDIA card

git clone https://github.com/CompVis/stable-diffusion.git
cd stable-diffusion/
conda env create -f environment.yaml

You need to register an account the hugging face website to get the model. Once registered get version 4 here (you can get other versions on the site as well) : https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt

mkdir -p models/ldm/stable-diffusion-v1/
ln -s [PATH TO MODEL YOU DOWNLOADED]/sd-v1-4.ckpt models/ldm/stable-diffusion-v1/model.ckpt

You might have notice that it installs pytorch but it’s the version for CUDA devices, in the section below we will install pytorch for ROCm

Step 3: Install pytorch for ROCm

Don’t forget to activate the conda environment first

conda activate ldm

You can get the command line at https://pytorch.org

As of this writting the latest version is the ROCM 5.2 version (compatible with ROCm 5.3)

pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.2/

Step 4: Test

Try:

python3 scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms

The examples should be in stable-diffusion/Outputs/

Performance / errors

For unsuported cards you might need to override your GPU by setting this ENV varriable. In the RX 6700s case it’s needed because it’s not directly supported for but another card with same architecture is

export HSA_OVERRIDE_GFX_VERSION=10.3.0

You have out of OOM issues you can reduce the resolution and the number of samples:

--H 256 --W 256
--n_samples=1

Some GPUs doesn’t support 16 bit precision well or at all so you might need to set this flag to enable full precision. On cards the do support fp16 you will get lower performance with this flag

--precision full

*Note: If you have trouble producing good images of people remember that the optimal size for SB is 512x512. Different output resolutions actually affects the nature of the image beyound just the scale. 256x256 vs 512x512 vs 1024x768 will generate vastly different images

If you don’t have enough vram for 512x512 try the steps below*

Optimized branch of stable-diffusion (try this to reduce VRAM usage)

Clone this fork that has a couple of optimisations that reduces VRAM usage

git clone https://github.com/basujindal/stable-diffusion.git

If you cloned the vanilla stable-diffusion repo (as in the previous steps above), you just need to drag and drop the OptimizedSD folder into the vanilla stable-diffusion folder and issue this command.

python3 optimizedSD/optimized_txt2img.py --prompt "A photograph of an astronaut riding a horse" --H 512 --W 512 --seed 1 --n_iter 2 --n_samples 1 --ddim_steps 50

You should see your VRAM usage greatly reduced (~2.6 GB)

RESULTS:

View post on imgur.com
View post on imgur.com

tags: Stable Diffusion - ROCm - AMD GPU - Machine Learning - RX 6700s