The 3D version of Tic Tac Toe is implemented as an OpenAI's Gym environment. The learning
folder includes several Jupyter notebooks for deep neural network models used to implement a computer-based player.
The traditional (2D) Tic Tac Toe has a very small game space (3^9). In comparison, the 3D version in this repo has a much larger space which is in the order of 3^27 or 7.6 trillion states. This makes computer-based players using search and pruning techniques of the game space prohibitively expensive.
Rather, the current learning models are based on policy gradient and deep Q-learning. The DQN model has produced very promising results. Feel free to experience on your own and contribute if interested. The PG-based model needs more work :)
The repo is also open for pull requests and collaborations both in game development as well as learning.
- Base dependency:
gym
. - Plot-rendering dependencies:
numpy
,matplotlib
. - DQN learning dependencies:
tensorflow
,numpy
.
To install run:
# In your virtual environment
pip install gym-tictactoe
Currently 2 types of environments with different rendering modes are supported.
To use textual rendering create environment as tictactoe-v0
like so:
import gym
import gym_tictactoe
def play_game(actions, step_fn=input):
env = gym.make('tictactoe-v0')
env.reset()
# Play actions in action profile
for action in actions:
print(env.step(action))
env.render()
if step_fn:
step_fn()
return env
actions = ['1021', '2111', '1221', '2222', '1121']
_ = play_game(actions, None)
The output produced is:
Step 1:
- - - - - - - - -
- - x - - - - - -
- - - - - - - - -
Step 2:
- - - - - - - - -
- - x - o - - - -
- - - - - - - - -
Step 3:
- - - - - - - - -
- - x - o - - - x
- - - - - - - - -
Step 4:
- - - - - - - - -
- - x - o - - - x
- - - - - - - - o
Step 5:
- - - - - - - - -
- - X - o X - - X
- - - - - - - - o
The winning sequence after gameplay: (0,2,1), (1,2,1), (2,2,1)
.
To use textual rendering create environment as tictactoe-plt-v0
like so:
import gym
import gym_tictactoe
def play_game(actions, step_fn=input):
env = gym.make('tictactoe-plt-v0')
env.reset()
# Play actions in action profile
for action in actions:
print(env.step(action))
env.render()
if step_fn:
step_fn()
return env
actions = ['1021', '2111', '1221', '2222', '1121']
_ = play_game(actions, None)
This produces the following gameplay:
Step 1:
Step 2: Step 3: Step 4: Step 5:The current models are under learning
folder. See Jupyter notebook for a DQN learning with a 2-layer neural network and using actor-critic technique.
Sample game plays produced by the trained model (the winning sequence is (0,0,0), (1,0,0), (2,0,0)
):