24 September 2021

Use stable-baselines to train an AI model to beat WWF Wrestlemania: The Arcade Game

by Mathieu Poliquin

WORK IN PROGRESS, meanwhile you can check the video form of this blog post

Stable-Baselines is a fork OpenAI’s baselines but with cleaner and more modular code that is for the most part easier to work with.

In this blog post we are going to use stable-baselines to train an AI model to beat WWF Wrestlemania: The Arcade Game (Genesis port).

Setup

Requires:

Tensorflow 1.X, 1.14 recommended
stable-baselines 2.10 (fork of baselines)
stable-retro (fork of gym-retro)

sudo apt-get update
sudo apt-get install cmake python3 python3-pip git zlib1g-dev libopenmpi-dev ffmpeg

pip3 install opencv-python anyrl gym joblib atari-py tensorflow-gpu==1.14 baselines stable-baselines pygame

git clone https://github.com/Farama-Foundation/stable-retro.git
cd stable-retro
pip3 install -e .

You need to provide your own roms

In your rom directory exec this command, it will import the roms into stable-retro

python3 -m retro.import .

Lite refresher on Reinforcement Learning and Neural Nets

This post assumes you have some basics on RL and NNs but here is a refresher.

Image from OpenAI’s spinning it up:

As opposed to explicitly coding the behavior of the agent like when using state machines for example, Reinforcement Learning instead lets the agent figure out their own behavior by giving the agent positive and negative rewards for the outcome of their actions.

stable-baselines

Bare bones example:

import retro
from stable_baselines import PPO2
from stable_baselines.common.policies import CnnPolicy
from stable_baselines.common.atari_wrappers import WarpFrame, ClipRewardEnv, FrameStack

GAME_ENV = 'Airstriker-Genesis'
STATE = 'Level1'
POLICY = 'CnnPolicy'
TIMESTEPS = 10000

def main():
    # Create Env
    env = retro.make(game=GAME_ENV, state=STATE) # Creates the env that contains the genesis emulator
    env = WarpFrame(env)                         # Downsamples the game frame buffer to 84x84 greyscale pixel
    env = FrameStack(env, 4)                     # Creates a stack of the last 4 frames to encode velocity
    env = ClipRewardEnv(env)                     # Make sure returned reward from env is not out of bounds

    # Create model that will be trained with PPO2 algo
    model = PPO2(policy=POLICY, env=env, verbose=True)

    # Train model on env for X timesteps
    model.learn(total_timesteps=TIMESTEPS)

    # Test the trained model
    state = env.reset()

    while True:
        env.render()

        # model takes as input a stack of 4 x 84x84 frames
        # returns which buttons on the Genesis gamepad was pressed (an array of 12 bools)
        actions = model.predict(state)

        # pass those actions to the environement (emulator) so it can generate the next frame
        # return:
        # state = next stack of image
        # reward outcome of the environement
        # done: if the game is over
        # info: variables used to create the reward and done functions (for debugging)
        state, reward, done, info = env.step(actions[0])

        if done:
            env.reset()

if __name__ == '__main__':
    main()

tags: stable-baselines - train ai model - openai - machine learning - CNN - WWF