Skip to content

PillarsZhang/q-learning-cliff-walking

Repository files navigation

Q Learning Cliff Walking (Q table and DQN)

This project adds random traps to the classic cliff walking environment, so DQN is also a solution. It's not very difficult to realize Q-Table and DQN. I have carried out complete result analysis and tedious visualization in this project. Due to time constraints, the code is not very concise and reasonable, but I am very satisfied with the output results.

Examples of visualization results

Result

  • Standard Cliff Walking [4x12] (Solution based on Q-Table)
    • It is impossible not to find the best path after 2,500 episodes of training.
  • Advanced Cliff Walking [4x12] (Solution based on DQN)
    • 95.7% success rate in 1000 tests, after 170,000 episodes of training.
    • Train: 1:02:10 (2e5 episodes), bench: 0:54:54 (200 checkpoints)
  • Advanced Cliff Walking [12x12] [trap:32-64] (Solution based on DQN)
    • 82.2% success rate in 1000 tests, after 365,000 episodes of training.
    • Train: 4:39:12 (6e5 episodes), bench: 2:06:51 (120 checkpoints)

Environment

Python 3.10 is perfered, since I used some new PEP features. View requirements.txt for other dependencies.

Usage

It is recommended to use VSCode for debugging. I have preset .vscode/launch.json.

I also prepared a bash script for the pipeline, which can execute the processing items as needed.

# Run all processes
./run.sh --device cuda:0 --train --bench --demo --test

You can run each process separately.

# Run train and bench (It takes a lot of time)
./run.sh --device cuda:0 --train --bench
# Run demo and test (And log to file)
./run.sh --device cuda:0 --demo --test | tee saved/run.log

However, the training and demonstration of the following 3 environments are independent, and you can run them at the same time.

Standard Cliff Walking [4x12] (Solution based on Q-Table)

The effect should be the same as that of CliffWalking-v0 of gym (Maybe there are differences in details. For example, I think it is reasonable not to return to the starting point after falling off the cliff).

# train
python standard_qtable.py
# demo
python demo_standard_qtable.py --run

Advanced Cliff Walking [4x12] (Solution based on DQN)

It uses the same 4x12 map as CliffWalking-v0, but will randomly generate 10 cliffs (perhaps better named traps) and start-end points that can ensure connectivity.

device="cuda:0"
# train
python advanced_dqn.py --device $device
# bench
python bench_advanced_dqn.py --device $device
# demo
python demo_advanced_dqn.py --device $device --run

Advanced Cliff Walking [12x12] [trap:32-64] (Solution based on DQN)

Using 12x12 map, but will randomly generate 32-64 cliffs. I used the BFS algorithm to ensure that there must be a feasible path.

device="cuda:0"
# train
python advanced_dqn.py --device $device --rand --large
# bench
python bench_advanced_dqn.py --device $device --rand --large
# demo
python demo_advanced_dqn.py --device $device --run --rand --large

Other schematic and performance data (Optional)

python test_env_check.py
python demo_other.py
python test_onnx_export.py

Reference

About

Q Learning Cliff Walking (Q table and DQN)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published