Logo

Indie Machine Learning and Video Game Dev

Neural Net Downloads
How to
Useful links
Hardware Reviews
Get Merch
About
11 June 2020

Using AMD and Intel GPUs on Windows with Tensorflow DirectML

by Mathieu Poliquin

Tensorflow DirectML

Recently Microsoft released a preview of their DirectML backend for tensorflow. This backend enables support for most DirectX 12 devices on Windows including AMD and Intel integrated GPUs.

This is very good news because the default CUDA based backend that is locked to NVIDIA cards and ROCm (for AMD cards) only works on Linux and doesn’t support all AMD cards. So up until now lots of users could not leverage their GPUs with tensorflow

links:

As Microsoft mentionned this is a preview and not all ops are supported which means low performance on certain benchmarks and use case

Hardware setup

Hardware specs:

Software:

installation

First Install python 3.7: download here
Next install tensorflow directML:

pip install tensorflow-directml

If you are using Windows Subsytem for Linux you need to install the AMD preview drivers first: download here

Benchmarks

RX 580 tests

At a Windows shell (or you can use GitHub desktop app to sync the repo)

git clone git@github.com:tensorflow/benchmarks.git
cd benchmarks/scripts/tf_cnn_benchmarks
python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet50 --variable_update=parameter_server

As you can see the screenshots bellow the performance is quite low compared to ROCm 3.3 where you get around 88 frames/s. One of the reasons for this is lack of support for some ops.

rx 580 benchmark
rx 580 usage

Intel UHD Graphics 620 - Integrated GPU

Same steps as for the RX 580 but with “–batch_size=16” so that it fits into memory

As you can see performance is also quite low, in comparaison the CPU version (Intel i7-8550U, without the use of AVX2 instructions) runs at 2.21 images/s

intel 620 gpu benchmark

If you want to see the YOLO sample provided by Microsoft in action:

Conclusion

As you may have noticed from the screenshots there is this error

Op type not registered '_CopyFromGpuToHost'

I reported the error and the RX 580 performance results on DirectML’s issue page

They state that there is still some ops that are not implemented yet which explains the low performance.

Although there is still lots of optimizations work that needs to be done, the DirectML backend for Tensorflow is a very useful initative from Microsoft as it will enable a lot more users to leverage their GPUs for Machine Learning that would have otherwise very little alternatives. I recommend you test DirectML out and report the results for your GPU on their issue’s page

tags: DirectML - AMD - Intel - Windows - tensorflow - machine learning