Repository with Reinforcement Learning (RL) algorithms tested in different simulations.
To install the latest stable version:
$ pip install rl-algorithms
For a specific version:
$ pip install rl-algorithms==0.0.1
To install the latest version available on Github:
$ pip install git+https://github.com/blurry-mood/RL-algorithms
- Install the desired environment, for example Robotic Warehouse:
$ cd environments
$ sh warehouse_bot.sh
- Run simulation, for instance using a Q-Learning agent:
$ cd simulations/qlearning
$ python qlearning.py
- On-Policy Monte Carlo
- Q-Learning
- SARSA
- n-step SARSA
- Deep Q-Learning (DQN)
- Deep Q-Learning with SARSA update rule (DSN)
- REINFORCE
- Actor-Critic
Every algorithm is implemented as a subclass of Agent. It imperatively needs to implement some methods, namely, save
, load
, take_action
, update
, and decode_state
.
Each algorithm re-implements all methods but decode_state
, the latter is left for user to implement based on target environment. take_action
method provides a description to what that method should look like (its inputs & outputs).
Based on the environment, a class extending the desired algorithm class must reimplement decode_state
.
Here's a concrete example on how to use the package:
from rl_algorithms import QLearning
class MyAgent(QLearning):
def decode_state(self, state):
return tuple(state)
qlearning = MyAgent(actions=list(range(10)), alpha=1e-2, gamma=0.85, eps=0.2)
For more examples, check the content of files inside simulations/
folder.
The base agents (algorithms) are implemented in a way that makes them ready to use off-the-shelf; the same methods are called in the same order in the script.
Here's a script that illustrates the idea:
"""
After defining the agent class & instance (for e.g. named qlearning)
"""
qlearning.load('qlearning_minihack') # load agent
for episode in range(10):
state = env.reset()
env.render(state)
n = 0
done = False
qlearning.start_episode() # initialize agent for a new episode
while not done:
n += 1
action = qlearning.take_action(state) # take action based on state
state, reward, done, info = env.step(action)
qlearning.update(state, reward) # learn from reward
env.render(state)
qlearning.end_episode() # update agent's internal logic
qlearning.save('qlearning_minihack') # save agent in Hard drive
The only thing that changes from an algorithm to another are the inputs and outputs of each method.
Also, some algorithms don't require calling all methods, for Q-Learning start_episode
and end_episode
can be safely discarded.
RL-Algorithms is a growing package. If you encounter a bug or would like to request a feature, please feel free to open an issue here.