Introduction
I love Tic-Tac-Toe. Besides bringing back fond childhood memories, it offers a simple playground for a noob like me to learn more about Computer Science, Algorithms and AI. In this blog post , I will use the Mini-Max algorithm to solve the game of Tic Tac Toe.
To solve the game means, we will be able to discover a strategy that ties the game against an optimal opponent and wins against any non-optimal opponent.
The Game
If you are from Mars, and do not know how the game works, here are the rules courtesy of Wikipedia
“Tic-tac-toe (American English), noughts and crosses (Commonwealth English and British English), or Xs and Os/“X’y O’sies” (Ireland), is a paper-and-pencil game for two players, X and O, who take turns marking the spaces in a 3×3 grid. The player who succeeds in placing three of their marks in a diagonal, horizontal, or vertical row is the winner. ”
The Minimax Algorithm
There are already plenty of great resources to learn about the minimax algorithm like this one which I will liberally borrow from. I only briefly explain the algorithm here.
The Mini-Max algorithm is perfect for 2-player (X vs Y) games like Tic-Tac-Toe where player X is trying to maximize his chances of winning while player Y is trying to minimize X’s chances of winning.
Consider a small imaginary game where each player can make one of two moves (Left or Right). Player X moves first followed by player Y at which point the score of the game is known.
The score of the game ranges from 1-4. Player X wants to maximize this score while player Y tries to minimize it.To determine what is the optimal action player X should take , we should work backward from the end of the game.
A,B,C,D,E,F and G are various states of the game where the latter 4 are terminal states.
Move 2
When making Move 2, Player Y can be in states B or C.
In State B, Player Y can choose L to yield a score of 4 or R to yield score of E. Given, he wants to minimize the overall score , he will choose R.
Similarly in State C, Player Y will choose R to minimize overall score.
The value of States B and C are now 3 and 1 respectively.
Move 1
For the first move, Player X is in state A. He can choose L which lands him in state B with a score of 3 or choose R which lands him in state C with a score of 1.
Given he wants to maximize the score, he chooses L.
Algorithm
This suggests an algorithm to choose the optimal move from any given state for a maximizing player.
At any given state, enumerate the possible child states. Now determine the value of these child states by invoking the minimizing player i.e. Ask the minimizing player what he would do in each of these states and return the value he gets. Choose the action that leads to the highest value state. This is implemented in the maximize
function below.
Similarly for for a minimizing player, enumerate the child states and determine the value of each state by invoking the maximizing player. Choose the action that leads to the lowest value state. This is implemented in the minimize
function below.
If at any stage, the child state is a terminal state, the value of the terminal state is simply returned.
Let us use code to solve the simple toy problem above.
import numpy as np
= {'a':['b','c'],'b':['d','e'],'c':['f','g']} #mapping from state to child states
children = {'d':4,'e':3,'f':2,'g':1} # value of terminal states
value = {'d':'L','e':'R','f':'L','g':'R','b':'L','c':'R'} # mapping from state to the action that produces #that state action
We will also keep a cache of the optimal next state for both maximizing and minimizing players, to make the computation a little faster. This hardly matters for this toy problem but can help for larger problems.
= {}
max_optimal_next_states = {} min_optimal_next_states
We define three functions defined below to implement the algorithm.
def produce_children(state):
"Function to produce children of a state"
return children.get(state,None)
def is_terminal(state):
"Function to check if a state is terminal and return value of the state"
if state in children:
return False,0
return True,value[state]
def get_action(next_state,current_state=None):
"Function to return action that moves player from current state to next state"
#current state is redundant for this example as there is only one way to get to a state
return action[next_state]
Now we define the key maximizing and minimizing functions that implements the core logic of the algorithm.
import random
def maximize(state):
if state in max_optimal_next_states:
return max_optimal_next_states[state]
= is_terminal(state)
terminal_status,reward if terminal_status:
return state,reward # No further state so return same state
= None,-np.Inf
max_state, max_score = []
max_states = produce_children(state)
children for child in children:
= minimize(child)
_,score if score > max_score:
= child,score
max_state,max_score = [max_state]
max_states elif score == max_score:
max_states.append(child)
# If multiple actions are optimal, break ties randomly
= random.choice(max_states)
max_state = (max_state,max_score)
max_optimal_next_states[state]
return max_state,max_score
def minimize(state):
if state in min_optimal_next_states:
return min_optimal_next_states[state]
= is_terminal(state)
terminal_status,reward if terminal_status:
return state,reward # No further state so return same state
= None,np.Inf
min_state, min_score = []
min_states = produce_children(state)
children for child in children:
= maximize(child)
_,score if score < min_score:
= child,score
min_state,min_score = [min_state]
min_states elif score == min_score:
min_states.append(child)
= random.choice(min_states)
min_state = (min_state,min_score)
min_optimal_next_states[state]
return min_state,min_score
def optimal_decision(state,player = 'Maximizer'):
if player == 'Maximizer':
= maximize(state)
max_state,_ return get_action(max_state,state)
else:
= minimize(state)
min_state,_ return get_action(min_state,state)
Let us confirm this returns the expected decisions in the image above.
'a',player = 'Maximizer') optimal_decision(
## 'L'
'b',player = 'Minimizer') optimal_decision(
## 'R'
'c',player = 'Minimizer') optimal_decision(
## 'R'
Now what if the roles were reverse with Player X being a minimizer and Player Y being a maximizer. We expect the decisions to flip.
'a',player = 'Minimizer') optimal_decision(
## 'R'
'b',player = 'Maximizer') optimal_decision(
## 'L'
'c',player = 'Maximizer') optimal_decision(
## 'L'
Minimax in Limited Tic Tac Toe
The toy problem above can be easily mapped to a limited tic-tac-toe game as shown in the image below.
Here, of the three possible moves available to player X in the starting state, the move to place ‘X’ in the center square is the only action that leads to a winning outcome. This can be figured out by recursively applying the minimax algorithm as illustrated above.
Note that Player X makes Move 1, Player Y makes Move 2 and Player X makes Move 3. This mini-game can end as early as after Move 1.
Solving TicTacToe with Minimax
Let us denote player ‘X’ who plays first using 1 and player ‘O’ who plays second using 2. An empty square will be represented with 0.
= {0:'__',1:'X',2:'O'} s_to_b
We will denote the state of a game using a tuple of length 9.For example:
= (1,0,0,2,0,0,0,0,0) state
The state can be converted to a board using the below function.
def state_to_board(state):
"Function to convert a a state(tuple) to a board(numpy array)"
= np.array([s_to_b[position] for position in state])
board return board.reshape(3,3)
1,0,0,2,0,0,0,0,0)) state_to_board((
## array([['X', '__', '__'],
## ['O', '__', '__'],
## ['__', '__', '__']], dtype='<U2')
A player wins if he or she gets a sequence of 3 ’X’s or 3 ’O’s.
= (1,1,1)
max_player_wins = (2,2,2) min_player_wins
Now all we have to do is redefine the functions is_terminal
, produce_children
and get_action
for the new tic-tac-toe problem
A terminal state is reached if one of the players wins or if the board is fully occupied in which case the game is tied.
We will set the score of the game as follows:
X wins: + 10
O wins: -10
Draw: 0
def is_terminal(state):
if state[slice(0,3)] == max_player_wins:
return True,10
elif state[slice(0,3)] == min_player_wins:
return True,-10
elif state[slice(3,6)] == max_player_wins:
return True,10
elif state[slice(3,6)] == min_player_wins:
return True,-10
elif state[slice(6,9)] == max_player_wins:
return True,10
elif state[slice(6,9)] == min_player_wins:
return True,-10
elif state[slice(0,7,3)] == max_player_wins:
return True,10
elif state[slice(0,7,3)] == min_player_wins:
return True,-10
elif state[slice(1,8,3)] == max_player_wins:
return True,10
elif state[slice(1,8,3)] == min_player_wins:
return True,-10
elif state[slice(2,9,3)] == max_player_wins:
return True,10
elif state[slice(2,9,3)] == min_player_wins:
return True,-10
elif state[slice(0,9,4)] == max_player_wins:
return True,10
elif state[slice(0,9,4)] == min_player_wins:
return True,-10
elif state[slice(2,7,2)] == max_player_wins:
return True,10
elif state[slice(2,7,2)] == min_player_wins:
return True,-10
elif state.count(0) == 0:
return True,0
else:
return False,0
from copy import deepcopy
def produce_children(state):
= list(state)
l = []
children = [i for i,v in enumerate(state) if v == 0]
vacant_slots if state.count(0) % 2 == 1: #If number of vacant spaces is odd , then it is max_player's turn
for slot in vacant_slots:
= deepcopy(l)
child = 1
child[slot] tuple(child))
children.append(else: #if number of vacant spaces is even then it is min_player's turn.
for slot in vacant_slots:
= deepcopy(l)
child = 2
child[slot] tuple(child))
children.append(
return children
= produce_children(state)
children children
## [(1, 1, 0, 2, 0, 0, 0, 0, 0), (1, 0, 1, 2, 0, 0, 0, 0, 0), (1, 0, 0, 2, 1, 0, 0, 0, 0), (1, 0, 0, 2, 0, 1, 0, 0, 0), (1, 0, 0, 2, 0, 0, 1, 0, 0), (1, 0, 0, 2, 0, 0, 0, 1, 0), (1, 0, 0, 2, 0, 0, 0, 0, 1)]
These correspond to the following states.
for child in children:
print(state_to_board(child),'\n')
## [['X' 'X' '__']
## ['O' '__' '__']
## ['__' '__' '__']]
##
## [['X' '__' 'X']
## ['O' '__' '__']
## ['__' '__' '__']]
##
## [['X' '__' '__']
## ['O' 'X' '__']
## ['__' '__' '__']]
##
## [['X' '__' '__']
## ['O' '__' 'X']
## ['__' '__' '__']]
##
## [['X' '__' '__']
## ['O' '__' '__']
## ['X' '__' '__']]
##
## [['X' '__' '__']
## ['O' '__' '__']
## ['__' 'X' '__']]
##
## [['X' '__' '__']
## ['O' '__' '__']
## ['__' '__' 'X']]
Finally we define a function to return the action that leads from the current state to the next state.
def difference(tuple1,tuple2):
"Helper function to get the index where first difference between two tuples is observed"
assert len(tuple1) == len(tuple2)
for i,value in enumerate(tuple1):
if value != tuple2[i]:
return i
return None
def get_action(next_state,current_state):
"Function to return action that moves player from current state to next state"
return difference(next_state,current_state),next_state
We can now evaluate whether we can solve the mini tic-tac-toe problem above with the tools at hand.The starting state of the game is given by
= (1,2,1,0,0,2,0,2,1)
state state_to_board(state)
## array([['X', 'O', 'X'],
## ['__', '__', 'O'],
## ['__', 'O', 'X']], dtype='<U2')
= optimal_decision(state,player = 'Maximizer')
optimal_action,optimal_next_state optimal_action,state_to_board(optimal_next_state)
## (4, array([['X', 'O', 'X'],
## ['__', 'X', 'O'],
## ['__', 'O', 'X']], dtype='<U2'))
As expected , the algorithm determines the optimal action for player ‘X’ is to occupy the central square in the board.
Let us consider one more path in the game where player X plays a non-optimal move resulting in the following state.
= (1,2,1,1,0,2,0,2,1)
state state_to_board(state)
## array([['X', 'O', 'X'],
## ['X', '__', 'O'],
## ['__', 'O', 'X']], dtype='<U2')
The optimal state of the minimizing player is given by
= optimal_decision(state,player = 'Minimizer')
optimal_action,optimal_next_state optimal_action,state_to_board(optimal_next_state)
## (4, array([['X', 'O', 'X'],
## ['X', 'O', 'O'],
## ['__', 'O', 'X']], dtype='<U2'))
The minimizing player also picks the optimal action as expected.
Putting it all together
Now we will analyze the following games
- Random X vs Random O
- Optimal X vs Random O
- Random X vs Optimal O
- Optimal X vs Optimal O
The following function represents a random player who chooses an available slot at random. The maximizer is player X while the minimizer is player O
import random
def random_decision(state,player = 'Maximizer'):
= [i for i,v in enumerate(state) if v == 0]
vacant_slots = random.choice(vacant_slots)
action = list(state)
state_as_list #Update state
if player == 'Maximizer':
= 1
state_as_list[action] else:
= 2
state_as_list[action]
return action,tuple(state_as_list)
Now we will define a function to play N games and record the results for analysis
from collections import defaultdict
from tqdm import tqdm
def play_games(n_games:int,X_strategy,O_strategy):
'''
n_games: Number of games to be player
X_strategy: function describing decision making strategy for player X
O_strategy: function describing decision making strategy for player Y
'''
= defaultdict(int)
win_stats #Dictionary for holding no of wins for games started with a particular move
= defaultdict(int)
move_wins_X = defaultdict(int)
move_wins_O #Dictionary for holding no of games started with a particular move
= defaultdict(lambda:-1)
move_X = defaultdict(lambda:-1)
move_O
for i in tqdm(range(n_games)):
random.seed(i)= (0,0,0,0,0,0,0,0,0)
state = False
terminal_status
= True # Flag identifying first move of player X
first_move_flag_X = True # Flag identifying first move of player O
first_move_flag_O
while not terminal_status:
#Player X plays;
= X_strategy(state,player='Maximizer')
player_x_action,next_state = is_terminal(next_state)
terminal_status,score
if first_move_flag_X:
= player_x_action
first_move_X += 1
move_X[first_move_X] = False
first_move_flag_X
#If player X plays last move
if terminal_status:
if score == 10: #player X wins
'X_win'] +=1
win_stats[+= 1 #record player's first move
move_wins_X[first_move_X] else:
'Draw'] += 1
win_stats[break
= next_state
state #Player O plays next
= O_strategy(state,player='Minimizer')
player_o_action,next_state = is_terminal(next_state)
terminal_status,score
if first_move_flag_O:
= player_o_action
first_move_O += 1
move_O[first_move_O] = False
first_move_flag_O
#If player O plays last move
if terminal_status:
if score == -10: #player O wins
'O_win'] +=1
win_stats[+= 1 #record player's first move
move_wins_O[first_move_O] else:
'Draw'] += 1
win_stats[break
= next_state
state
return win_stats,move_wins_X,move_wins_O,move_X,move_O
We also create a helper function to visualize the results.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
def plot_results(results):
= results
win_stats,move_wins_X,move_wins_O,move_X,move_O = pd.DataFrame({'Category':list(win_stats.keys()),'Count':list(win_stats.values())})
win_stats_df = {i:move_wins_X[i]/move_X[i] for i in range(9)}
move_X_win_rate = {i:move_wins_O[i]/move_O[i] for i in range(9)}
move_O_win_rate
= np.array([move_X_win_rate[x] for x in range(9)]).reshape(3,3)
move_X_win_rate_array = np.array([move_O_win_rate[x] for x in range(9)]).reshape(3,3)
move_O_win_rate_array
set(font_scale=2)
sns.= plt.subplots(ncols=3,figsize=(30,10))
fig, axs = sns.barplot(x="Category",y="Count",data=win_stats_df,ax=axs[0])
splot for p in splot.patches:
format(p.get_height(), '.0f'),
splot.annotate(+ p.get_width() / 2., p.get_height()),
(p.get_x() = 'center', va = 'center',
ha = (0, -12),
xytext = 'offset points')
textcoords 'Distribution of Wins,Losses and Ties')
splot.set_title(
=True,ax = axs[1]).set_title('Player X:% of wins for first move')
sns.heatmap(move_X_win_rate_array,annot=True,ax = axs[2]).set_title('Player Y:% of wins for first move')
sns.heatmap(move_O_win_rate_array,annot
plt.show()
Random X vs Random O
= play_games(1000,X_strategy=random_decision,O_strategy=random_decision) results1
##
0%| | 0/1000 [00:00<?, ?it/s]
100%|##########| 1000/1000 [00:00<00:00, 20874.97it/s]
When both players follow a random strategy, player X has a first mover advantage and wins the majority of the games.
plot_results(results1)
We also see that given both players use random strategies: player X can play the first move in any of the squares without it affecting the win rate significantly.
For player O on the other hand, playing the first move in the center or corner square results in a significantly higher win rate.
Optimal X vs Random O
= play_games(1000,X_strategy=optimal_decision,O_strategy=random_decision) results2
##
0%| | 0/1000 [00:00<?, ?it/s]
0%| | 1/1000 [00:00<02:18, 7.21it/s]
100%|##########| 1000/1000 [00:00<00:00, 5633.05it/s]
plot_results(results2)
When player X uses the optimal minimax strategy,it wins almost all the games.
It consistently picks the bottom right hand corner in the first move in every game.
Random X vs Optimal O
= play_games(1000,X_strategy=random_decision,O_strategy=optimal_decision) results3
##
0%| | 0/1000 [00:00<?, ?it/s]
100%|##########| 1000/1000 [00:00<00:00, 22279.67it/s]
plot_results(results3)
In this case, O has a lower win rate given it does not have a fist mover advantage.
The win rate is highest when Player O gets to occupy the central square in the first move.
Optimal X vs Optimal O
= play_games(100,X_strategy=optimal_decision,O_strategy=optimal_decision) results4
##
0%| | 0/100 [00:00<?, ?it/s]
100%|##########| 100/100 [00:00<00:00, 20218.39it/s]
plot_results(results4)
When both players play the optimal strategy, all games end in ties.
You can find the jupyter notebook for the blog post here