Skip to content

Latest commit

 

History

History
117 lines (74 loc) · 3.01 KB

README.md

File metadata and controls

117 lines (74 loc) · 3.01 KB

About repository

Learning repository. Target is to implement q-learning in different environment than gym-ai.

Target has been achieved both in q-learning and deep q-learning.

I learned here how to use models with more inputs than one, define custom models and connections I used 2 techniques of training:

  • epoch, training every epoch
  • step, training model every step he made

Step method, is more efficient, model learns quicker and we can teach him all samples he gained from last move.

Convolutional layers would greatly improve deep learning.

Table of Contents

  1. Environment
    • python version
    • objectives
  2. Snake inputs
    • Smell (food position)
    • Vision
  3. Q-learning results
    • Results
    • Videos
  4. Deep Q-learning results
    • Model
    • Results

Environment

Python Version

Python==3.7.6

Todo:

  • Write Game environment

  • Teach AI to play

    • Implement reinforced learning
    • Train
    • Get highest score possible
    • Try different inputs:
      • current direction + relative food position + field around head(3,3)
      • relative food position + field around head(3,3) and 4 actions
  • Use deep q-learning

    • input whole area ahead

Snake inputs

Smell

  • Snake has knowledge of relative position to head. It is combination of x and y

  • In deep learning, position is float number in range <-1, 1>, downscaled by arena width and height.

Food position

View area

In deep learning area is bigger, and also can see food.

Snake vision

Q-learning results

Score

Score is calculated as follows:

  • 50 pts per food consumed
  • -1 per move
  • -100 every run as death reward

Snake 1

Snake 2

Snake 3

Snake 4 has higher epsilon earlier and 0 at 5000 episode. Snake 4

Videos:

Snake 1 - default ml parameters

Snake 1

Snake 2 - faster learning parameters

Snake 2

Deep Q-learning results

Model with 2 inputs

First input is view area, obstacles and food that snake can see.

Second input is food postion relative to head, in range <-1, 1>

Model

Results from training up to 9000 episodes. Each episode has 10 agents playing.

Effectiveness is calculated as food-eaten / moves-done. We can see rising values, both food and effectiveness.

Snake

Video / gif

< Will be posted here >