by Mathieu Poliquin
On second hand market there is lots of cheap GPUs and motherboards intended for mining but they come with some challenges:
CUDA-Z PCIE bandwith test
|PCIE slot||GPU||HtoD bandwidth|
Let’s do a basic OpenAI baselines test on the 4th PCIE slot (P106-100):
python3 -m baselines.run --alg=a2c --env=PongNoFrameskip-v4 --num_timesteps=2e7 --num_env=6
Results: ~ 500 fps
As you can see from the profiling screenshot above the bottlenecks are clearly the Host to Device transfers and CPU. Interestingly these bottlenecks are simular to high end GPUs as discussed in NVIDIA Research’s paper on CuLE
So the question is: can CuLE also be used to leverage mining hardware?
Let’s find out!
using the recommened parameters from CuLE’s github readme
python vtrace_main.py --env-name PongNoFrameskip-v4 --normalize --use-cuda-env --num-ales 1200 --num-steps 20 --num-steps-per-update 1 --num-minibatches 20 --t-max 8000000 --evaluation-interval 200000
Much better than the 500 fps we get without CuLE
If you want these tests in action you can watch in the video below. Also I made one with all 4 GPUs where a total of near 10 000 fps is reached:
Using NVIDIA CuLE it’s certainly possible to leverage cheap mining hardware for Machine Learning. That said only the Atari 2600 emulator has a 100% GPU port so you would be stuck with that for now. Moreover even if other emulators gets ported, the performance might not be as advantageous since the more complex the code (branching) the less GPUs performs well. Even amongst Atari games there are high performance differences, as some are more complex to emulate. For example in CuLE paper they mentioned Riveraid runs at 134K while Boxing runs at 34K on their GPUs.
Obiviously NVIDIA CuLE is using CUDA so this system doesn’t work on AMD card and to my knowledge AMD research doesn’t offer alternative for now.tags: machine learning - mining - p106-100 - p106-090 - p104-100