About Hardware guide
25 September 2021

Make two models fight each other in a PvP retro game

by Mathieu Poliquin

There is something quite fun about seeing two AI models battle it out in PvP games… but first please note that this post is a follow-up on a previous post about using stable-baselines 2.10 to beat a retro game so be sure to check it out (which covers the basics) before reading this one:

If you want to see a 1.7M parameters CNN vs 3.6M parameters MLP model in action:

2 player example with stable-baselines

This a bare bones but complete example on how to support 2 players. If you have read the previous blog post You can notice that the main differences:

That’s it for the difference, you can copy-paste this and try it

import retro
import numpy as np
from stable_baselines import PPO2
from stable_baselines.common.policies import CnnPolicy
from stable_baselines.common.atari_wrappers import WarpFrame, ClipRewardEnv, FrameStack

GAME_ENV = 'Pong-Atari2600'
STATE_1P = 'Start'
STATE_2P = 'Start.2P'
POLICY = 'CnnPolicy'

def apply_wrappers(env):
    env = WarpFrame(env)                         # Downsamples the game frame buffer to 84x84 greyscale pixel
    env = FrameStack(env, 4)                     # Creates a stack of the last 4 frames to encode velocity
    env = ClipRewardEnv(env)                     # Make sure returned reward from env is not out of bounds

    return env

def main():
    # Create Env
    env = retro.make(game=GAME_ENV, state=STATE_1P) # Creates the env that contains the genesis emulator

    # Create p1 model that will be trained with PPO2 algo
    p1_model = PPO2(policy=POLICY, env=env, verbose=True)
    # Train p1 model on env for X timesteps

    # Create p2 model that will be trained with PPO2 algo
    p2_model = PPO2(policy=POLICY, env=env, verbose=True)
    # Train p2 model on env for X timesteps

    # Close previous env since we cannot have more than one in this same process

    # Create 2 player env
    env_2p = retro.make(game=GAME_ENV, state=STATE_2P, players=2) # Creates the env that contains the genesis emulator

    # Test the trained model
    state = env_2p.reset()

    while True:

        # model takes as input a stack of 4 x 84x84 frames
        # returns which buttons on the Genesis gamepad was pressed (an array of 12 bools)
        p1_actions = p1_model.predict(state)
        p2_actions = p2_model.predict(state)
        #actions = env_2p.unwrapped.action_space.sample()
        actions = np.append(p1_actions[0], p2_actions[0])

        # pass those actions to the environement (emulator) so it can generate the next frame
        # return:
        # state = next stack of image
        # reward outcome of the environement
        # done: if the game is over
        # info: variables used to create the reward and done functions (for debugging)
        state, reward, done, info = env_2p.step(actions)

        if done:

if __name__ == '__main__':


If you want a more complete, ready to use example, you can find the retro-scripts project on my Github which I used to make the above video:

Samurai Shodown PvP example

If you are fan of Samurai Shodown on Genesis check out this video I have made. I walk thought the code above.

tags: stable-baselines - model vs model - openai - machine learning - CNN - MLP