Posted by : at

Category : artificialintelligence

Last week me and a couple of friends had an idea: let’s use PySC2 (a Python wrapper for the StarCraft 2 API) to build a reinforcement learning agent that can teach itself how to play StarCraft 2.

This is no easy task, and many people have attempted it:

But we thought it would be an interesting way to get into the world of Reinforcement Learning.

Our agent is an A2C network, that, if trained, right now, converges to outputting only 0s.

Below is our first version of the AI, taking random actions. The map is called CollectMineralShards; the aim is to move your marines (the two green circles) to as many Mineral Shards (blue circles) as possible in the allotted time.

## Getting it to run

If you want to try out our project yourselves, head over to https://github.com/deepmind/pysc2 and follow their instructions to get the StarCraft2 environment.

For our algorithm we used PyTorch, so make sure you have that installed and running.

After you install PyTorch, head over to github, clone our repository, and run it according to the instructions: https://github.com/Tzeny/deepstellar

## PySC2 – StarCraft II Learning Environment

PySC2 is DeepMind’s Python component of the StarCraft II Learning Environment (SC2LE). It exposes Blizzard Entertainment’s StarCraft II Machine Learning API as a Python RL Environment.

It can run one or two agents / game, and many games in parallel.

Agent get an observation of the game state after each N in game time steps. N can be set, in our case we used N = 16 for an equivalent APM of 90.

Below is the code for an agent that takes a random action at each time step.

class RandomAgent(base_agent.BaseAgent):
"""A random agent for starcraft."""

def step(self, obs):
super(RandomAgent, self).step(obs)
function_id = numpy.random.choice(obs.observation.available_actions)
args = [[numpy.random.randint(, size) for size in arg.sizes]
for arg in self.action_spec.functions[function_id].args]
return actions.FunctionCall(function_id, args)

The obs object contains valueable information about the game state:

• Screen (you can select any combination of the 2 items below)
• Features (shown in Figure 1. above)
• RGB pixels
• Minimap (you can select any combination of the 2 items below)
• Features (shown in Figure 1. above)
• RGB pixels
• Player information
• Control groups
• Single select
• Multi select
• Cargo
• Build queue
• Alers
• Available actions
• Last actions (only for successful actions)
• Action result

## Reinforcement Learning – A2C

Our agent has to look at the observations for the current step, choose an action that would best further its goals, and predict a value for the current state.

A state’s value = the sum of all the rewards if you were to start in that state and move forward