Intelligent Agents

A collection of reinforcement learning and intelligent agents projects showcasing implementations of key algorithms and their comparative analysis. These projects were developed as part of the CS747: Foundations of Intelligent and Learning Agents course at IIT Bombay.

Projects Included

Bandits (`src/bandits`)

Implements and compares classical multi-armed bandit algorithms: ε-greedy, UCB, KL-UCB, and Thompson Sampling, along with a custom variant: Thompson Sampling with a hint. Each algorithm minimizes cumulative regret across different horizons and random seeds.

Key Findings:

Thompson Sampling generally outperforms others in regret minimization.
KL-UCB improves over UCB using a tighter confidence bound via binary search.
ε-Greedy performs best at ε ≈ 0.02 — striking a balance between exploration and exploitation.
The Thompson Sampling "hinted" version leverages knowledge of true means, improving early performance through a custom Beta-distribution-based selector.

Includes regret plots over multiple seeds and horizons, as well as parameter studies.

MDP Maze Solver (`src/mdp`)

Solves mazes by modeling them as Markov Decision Processes using:

Value Iteration
Linear Programming (via PuLP)
Howard’s Policy Iteration

Pipeline:

encoder.py transforms grid mazes into MDPs.
solver.py computes optimal policy.
decoder.py reconstructs the shortest path using the policy.

Insights:

LP is consistently fastest for large mazes.
Howard's Policy Iteration performs well on small problems but becomes costly as maze complexity grows.
Visual comparisons confirm that solved mazes follow intuitive paths with minimal steps.

Benchmarks for runtime across methods and visualizations for grid navigation are included.

Windy Gridworld (`src/windy_gridworld`)

Adopts the Sutton & Barto Windy Gridworld challenge with multiple RL approaches:

Sarsa (normal and King’s moves)
Sarsa with stochastic wind
Q-Learning
Expected Sarsa

Key Results:

Sarsa with King’s Moves converges fastest due to shorter episodes.
Q-Learning and Expected Sarsa outperform standard Sarsa on stability and convergence.
The stochastic wind variant adds realistic randomness but slows convergence.
Paths from all agents are visualized for both deterministic and windy environments.

Gridworld is defined as an episodic MDP with reward shaping and stepwise convergence plotting.

🔧 Running Experiments with `run.py`

Use the run.py script to run all experiments. It acts as a unified launcher for bandits, MDP solving, verification, visualization, and Windy Gridworld tasks.

Enable --verbose to view subprocess outputs and logs in real time.

Bandits

python run.py --verbose bandits \
  --instance data/bandits/instances/i-1.txt \
  --algorithm thompson-sampling \
  --rseed 42 \
  --epsilon 0.1 \
  --horizon 1000

MDP Maze Solver

python run.py --verbose solve_mdp \
  --grid data/mdp/grids/grid10.txt \
  --algorithm pi

To create a synthetic MDP file:

python run.py --verbose generate_mdp \
  --num_states 10 \
  --num_actions 5 \
  --gamma 0.95 \
  --mdptype episodic \
  --rseed 42 \
  --output_file src/mdp/tmp/generated_mdp.txt

To verify all default grids (10 through 100):

python run.py --verbose verify_mdp --algorithm vi

To verify specific grids:

python run.py --verbose verify_mdp \
  --algorithm lp \
  --grid data/mdp/grids/grid40.txt data/mdp/grids/grid50.txt

To visualize a grid:

python run.py --verbose visualize_mdp \
  --grid_file data/mdp/grids/grid10.txt \
  --output_file plots/mdp/grid10_unsolved.png

To visualize a solved grid:

python run.py --verbose visualize_mdp \
  --grid_file data/mdp/grids/grid10.txt \
  --path_file data/mdp/paths/path10.txt \
  --output_file plots/mdp/grid10_solved.png

Windy Gridworld

python run.py --verbose windy \
  --episodes 200 \
  --epsilon 0.15 \
  --discount 0.99 \
  --learning-rate 0.5

Command Reference (Summary)

Command	Description
`bandits`	Run multi-armed bandit experiments
`windy`	Run Windy Gridworld RL agents
`generate_mdp`	Generate synthetic MDP instance files
`solve_mdp`	Solve a maze-based MDP using vi/pi/lp
`verify_mdp`	Verify path optimality for maze solvers
`visualize_mdp`	Create visual output of MDP grid/paths

References

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
plots		plots
references		references
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Intelligent Agents

Projects Included

Bandits (`src/bandits`)

MDP Maze Solver (`src/mdp`)

Windy Gridworld (`src/windy_gridworld`)

🔧 Running Experiments with `run.py`

Bandits

MDP Maze Solver

Windy Gridworld

Command Reference (Summary)

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

aaronjs99/intelligent-agents

Folders and files

Latest commit

History

Repository files navigation

Intelligent Agents

Projects Included

Bandits (src/bandits)

MDP Maze Solver (src/mdp)

Windy Gridworld (src/windy_gridworld)

🔧 Running Experiments with run.py

Bandits

MDP Maze Solver

Windy Gridworld

Command Reference (Summary)

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Bandits (`src/bandits`)

MDP Maze Solver (`src/mdp`)

Windy Gridworld (`src/windy_gridworld`)

🔧 Running Experiments with `run.py`

Packages