videogames.ai Blog About Hardware guide

Install ROCM and Tensorflow for Machine Learning on AMD GPUs

EDIT 2022: For ROCm 5.X (tested with the RX 6700s card) I started to make a guide here: https://www.videogames.ai/2022/09/01/RX-6700s-Machine-Learning-ROCm.html

If you have a RX 580 or an older card you and if you have trouble with ROCm 5.X I recommend you try as an older version of ROCm.

For ROCm version < 5.0

This is a condensed version of AMD’s ROCm 3.0 install instruction + some extra details You can find their guide here: ROCm official install guide

Requirements

Setup used to for this guide:

Step 1 - Update system

Firt make sure your system is up to date, next install libnuma-dev

sudo apt install libnuma-dev

Step 2 - Configure repo

wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list

Step 3 - Install ROCm

sudo apt update
sudo apt install rocm-dkms

Step 4 - Setup permissions and Environment variables

Your user will need permission to access the gpu

sudo usermod -a -G video $LOGNAME 

Setup $PATH to rocm binaries such as rocm-smi

echo 'export PATH=$PATH:/opt/rocm/bin:/opt/rocm/profiler/bin:/opt/rocm/opencl/bin/x86_64' | sudo tee -a /etc/profile.d/rocm.sh

Step 5 - Test your setup

You should be able to see your GPUs in the output of these commands

/opt/rocm/bin/rocminfo 
/opt/rocm/opencl/bin/x86_64/clinfo 

Install tensorflow

First install these requirements

sudo apt install rocm-libs miopen-hip cxlactivitylogger

This will install TF-rocm 2.0

pip3 install tensorflow-rocm

Note that for using OpenAI baselines the latest supported version is 1.14, so you would need to specify the version

pip3 install tensorflow-rocm==1.14.5

Issues

Install RCCL For some reason, RCCL libs can be missing. You would get and error message when using TF In that case you can install it using this:

sudo apt install rccl

If you get this error

python3: Relink `/lib/x86_64-linux-gnu/libsystemd.so.0' with `/lib/x86_64-linux-gnu/librt.so.1' for IFUNC symbol `clock_gettime'
python3: Relink `/lib/x86_64-linux-gnu/libudev.so.1' with `/lib/x86_64-linux-gnu/librt.so.1' for IFUNC symbol `clock_gettime'

Install this:

sudo apt install libtinfo5

Tricks

Set which gpu device is visible to rocm

ex: GPU 0

export ROCR_VISIBLE_DEVICES=0

Equivalent of nvidia-smi tool

rocm-smi

If you want to see it in a loop like for “nvidia-smi -l 1”

watch -n 1 rocm-smi

Test PCIE bandwidth

sudo apt-get install rocm-bandwidth-test
rocm-bandwidth-test

tags: rocm - AMD - gpu - machine learning - ubuntu - 18.04 - 18.1