Gridworld with Reinforcement Learning

A simple gridworld environment and an agent that navigates through it using Q-learning. We set up the environment, define the agent, and then train the agent.

1. The Gridworld Environment

Grid Size: A 10x10 grid.

Start Position: Positioned at the top-left corner.

Goals: Instead of a single goal, we have multiple goals with varying rewards.

Obstacles: The grid contains a few obstacles the agent must navigate around.

2. The Agent

Movements: The agent can move up, down, left, or right.

Learning Method: Employs Q-learning to deduce the optimal policy.

Stochastic Environment: Sometimes, the agent's chosen action might not result in the expected movement. For instance, if the agent opts to move right, there's a slight possibility it could end up moving up. This is governed by slip_prob.

Exploration vs. Exploitation: As the agent becomes more familiar with its environment, it increasingly depends on exploiting its accumulated knowledge, decreasing its exploration rate. This is controlled by decay_epsilon.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
RL_gridworld.ipynb		RL_gridworld.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Gridworld with Reinforcement Learning

1. The Gridworld Environment

Grid Size: A 10x10 grid.

Start Position: Positioned at the top-left corner.

Goals: Instead of a single goal, we have multiple goals with varying rewards.

Obstacles: The grid contains a few obstacles the agent must navigate around.

2. The Agent

Movements: The agent can move up, down, left, or right.

Learning Method: Employs Q-learning to deduce the optimal policy.

Stochastic Environment: Sometimes, the agent's chosen action might not result in the expected movement. For instance, if the agent opts to move right, there's a slight possibility it could end up moving up. This is governed by slip_prob.

Exploration vs. Exploitation: As the agent becomes more familiar with its environment, it increasingly depends on exploiting its accumulated knowledge, decreasing its exploration rate. This is controlled by decay_epsilon.

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

omerozerr/Gridworld_with_Reinforcement_learning

Folders and files

Latest commit

History

Repository files navigation

Gridworld with Reinforcement Learning

1. The Gridworld Environment

Grid Size: A 10x10 grid.

Start Position: Positioned at the top-left corner.

Goals: Instead of a single goal, we have multiple goals with varying rewards.

Obstacles: The grid contains a few obstacles the agent must navigate around.

2. The Agent

Movements: The agent can move up, down, left, or right.

Learning Method: Employs Q-learning to deduce the optimal policy.

Stochastic Environment: Sometimes, the agent's chosen action might not result in the expected movement. For instance, if the agent opts to move right, there's a slight possibility it could end up moving up. This is governed by slip_prob.

Exploration vs. Exploitation: As the agent becomes more familiar with its environment, it increasingly depends on exploiting its accumulated knowledge, decreasing its exploration rate. This is controlled by decay_epsilon.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages