A work in progress project to create all of Pokemon Diamond’s game logic, procedural world generation, and unstoppable AI from scratch in Python.
<current progress ~= 200 hours>
Game-playing adversarial Artificial Intelligence is a form of machine learning generally characterized as a time and compute limited model for simulating competitive behavior. In the context of Pokemon, that intelligence can be further defined as the ability of a turn-based agent to choose actions that, combined with unknown player actions, result in a maximum payoff for the agent.
Given this setup, my goal is to create a PokemonAgent
that may be given a game state including agent PokemonTrainer
, adversary PokemonTrainer
and associated Pokemon
objects, and effectively infer the optimal action in constant time relative to the number of available actions.
The Berkeleymon engine runs a Pokemon game by parsing Location
objects as 2 dimensional arrays of tiles processed by a GameBoard
engine that updates player location, sprite batches, and executes overlays like TextBox
objects and Encounter
(battle) objects. The overworld is a 1:1 sprite copy of Pokemon Diamond.
The player sprinting in the overworld, with the reachable area rendered at 40hz by the GameBoard
.
By using a tile system and offset tracking, the player can freely navigate this 2d world. Upon triggering an encounter (such as finding a wild Pokemon in the grass), rendering and I/O control are ceded to the Encounter
object and a new PokemonAgent
is initialized.
Main battle menu, showcasing all player actions.
Move battle menu, with moves colored by type.
As in the images above, the player actions consist of attacking with one of four moves, switching to one of up to five other Pokemon, or using one of around 100 items. While an agent may not, the player may also choose to run from a (wild) encounter.
Thus, the state space of a Pokemon battle is ≥ $110^{2n}$, where $n$ represents the number of turns before the battle is decided, as both players independently make one of up to 110 choices $n$times. To showcase how enormous this state space is, a battle lasting only 3 turns may proceed in up to 1,771,561,000,000 different ways (that’s almost 2 trillion).
Assuming a player will take no more than 5-10 seconds before deciding upon an action, exploring the entire state space in that time is infeasible. Additionally, due to the unknown probability distribution of the adversary (player), it is also not feasible to elect a strategy before the battle begins. Thus, the models that I will explore in this paper all revolve around using rollouts or probabilistic tree search to explore as much of the relevant state space as possible.
This agent is the template for all future models, and chooses a legal action at random given the game state. While it can query the GameBoard
to know information about its own and its adversary’s actions, it simply chooses randomly from among the legal actions for the agent.
class RandomLegalAgent:
"""An AI agent to do battle with a Pokemon team"""
def __init__(self, board):
...
...
def get_valid_actions(self, curr_mon = None, curr_trainer = None):
"""Returns all valid actions the agent may take."""
...
return all_actions
def get_valid_adversary_actions(self, curr_mon = None, trainer = None):
"""Returns all valid actions the player may take"""
...
return all_actions
def action(self):
"""Take a random legal action."""
all_actions = self.get_valid_actions()
return np.random.choice(all_actions)
This agent behaves poorly then, as it’s unable to ascertain any kind of value to its moves without understanding the state of the adversary.
From the RandomLegalAgent
, my next model intended to perform a Monte Carlo Tree Search type series of simulations with as little asymptotic complexity as possible.