Skip to content

imagry/Gym-Torcs-DQN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
root
Oct 1, 2017
d5b1e1b · Oct 1, 2017

History

9 Commits
Jul 24, 2017
Jul 26, 2017
Jul 24, 2017
Jul 24, 2017
Jul 24, 2017
Jul 24, 2017
Jul 26, 2017
Jul 24, 2017
Jul 24, 2017
Oct 1, 2017
Jul 24, 2017
Jul 24, 2017
Jul 25, 2017
Jul 24, 2017
Jul 24, 2017
Jul 24, 2017

Repository files navigation

Video

Installation:

  1. Install Caffe with opencv2 (opencv is also neeeded for this version of gym-torcs) https://github.com/BVLC/caffe

  2. install Torcs dependencies 1.0 remove old gym-torc if necessary:

    • make distclean

    1.1 xautomation (http://linux.die.net/man/7/xautomation)

    • apt-get install xautomation

    1.2 OpenAI-Gym (https://github.com/openai/gym)

    • apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig
    • pip install gym
  3. Install gym-torcs 2.1 install vtorcs-RL-color dependencies

    • sudo apt-get install libglib2.0-dev libgl1-mesa-dev libglu1-mesa-dev freeglut3-dev libplib-dev libopenal-dev libalut-dev libxi-dev libxmu-dev libxrender-dev libxrandr-dev libpng12-dev

    2.2 install gym-torcs

    • cd /gym-torcs/ command:
    • SRC_DIR=proto DST_DIR=. CPP_DST_DIR=vtorcs-RL-color/src/drivers/scr_server/; protoc -I=$SRC_DIR --cpp_out=$CPP_DST_DIR --python_out=$DST_DIR $SRC_DIR/car_state.proto

    2.2.1 install vtorcs-RL-color

    • cd /gym-torcs/vtorcs-RL-color
    • ./configure
    • make -j8
    • if "no ossspec.h" or "no tgfclien.h" errors just restart make
    • make install
    • make datainstall

    2.3 Initialize race

    • sudo torcs Configure driver: Race --> Quick Race --> Configure Race-->(Select track)Accept -->On the left column Deselect "Damned" driver

      Quit all menues

      To temporary increase resolution if font is too small before configuring driver Option-->Display-->choose resolution and affter configuring driver Option-->Display-->choose Screen resolution 256x192

  4. export Caffe and CUDA paths

  • export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64:/usr/local/lib
  • export PYTHONPATH=/home/sergey/repo/caffe/python:$PYTHONPATH
  1. Download models from this link and put them into working directory specified in current_dir in dqn.py
  • r18nb.caffemodel is original resnet-18 model trained on the imagenet. Batch normalization layers merged into convolutions
  • q_solver.caffemodel is dqn current training model
  • qq_target.caffemodel is dqn taret model, running avergae of training model

Model archiecture:

Protofiles for models are in the directory resnet_torcs

  • critic_batch_dqn.prototxt for qq_target.caffemodel
  • dqn_critic_solver.prototxt for q_solver.caffemodel
  • critic_deploy_dqn.prototxt is protofile for driving agent
  • resnet18_nbn.prototxt is not used in project, it's prtofile for resnet-18 with merged BN

Training:

For general description check this blog post How to start:

  • in dqn.py you should modify:

    • you current directory, this is the directory where your initial models are in the line 36: current_dir = '/home/sergey/repo/gym-torcs/'
    • chose replay or training in the line 35, False for training play = False
    • chose to start form imagenet renet-18 model (r18nb.caffemodel), line 509 first_run = True
    • or from saved dqn network first_run = False
    • delete and reinitialzie replay buffer, line 510: reset_buffer = True
    • save model and replay buffer in nnn iterations, line 533 save_in_iters = nnn
    • max number of steps per track, line 532 (each step take at least 0.4 second) max_steps = 5000
  • in ../resnet_torcs/dqn_critic_solver.prototxt replace net: parameter with address of critic_dqn.prototxt, for example net: /home/user/public_oss/Gym-Torcs-DQN/resnet_torcs/critic_dqn.prototxt

  • run python dqn.py

  • Important parameters

    • size of replay buffer, line 21 BUFFER_SIZE = 70000
    • gamma and tau, line 514,515 (explained in blog post) GAMMA =0.99 TAU = .0001
    • steering action amplitude, line 26 (require retraining): DELTA_A = .85
    • exploration noise, line 518-520 initial value of noise (for starting training recommended 0.5, for continuation of training recommended 0.05) act_noise_init = .5 minimal value of noise act_noise_final = .01 inverse scale factor act_noise_interval = 100000
    • action smoother and limiter, line 528-529 (explained in blog post) action_smoother = .33 action_limiter = .33
    • avergae speed and speed variance, line 560-563 c_speed = 35 delta_speed = 5
  • Useful parameters

    • switch speed randomly after swith_count_min(1 + uniform(0,1)) steps swith_count_min = 50
    • n-step dqn (see blog post) with n = max_n_batches*BATCH_SIZE. max_n_batches = 16
    • n_step ratio - use n-step with n_step_ratio probaility and normal dqn for the rest. For normal dqn use n_step_ratio = 0 n_step_ratio = 0.75
    • priority sampling with k-try (see blog post) k_priority_try = 2
    • stop episod after max_errors max_errors = 50
  • Risky parameters

    • Q function spatial TV-L2 regularization, may make action smoother. Should be very small. lambda_spatial_q = 0
    • parameter for TD-Lambda line 437 (see blog post), lesser alpha is only for very small exploration noise alpha = 1

**** Have fun! ****

Acknowledgement: Caffe is developed by Berkeley AI Research (BAIR)/The Berkeley Vision and Learning Center (BVLC) and community contributors Naoto Yoshida is the author of the gym torcs. Implementation of replay buffer is based on Ben Lau work.