Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement the sliding tile puzzle env #189

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
74b4a8f
feat: implement the sliding tile puzzle env
ElshadaiK Jul 18, 2023
2883f60
exp: sliding tiles with a2c first run
ElshadaiK Jul 18, 2023
9c41355
fix: bug fix
ElshadaiK Jul 18, 2023
233b0ea
Update jumanji/environments/logic/sliding_tile_puzzle/generator.py
ElshadaiK Jul 24, 2023
02e812c
Update jumanji/environments/logic/sliding_tile_puzzle/generator.py
ElshadaiK Jul 24, 2023
9e12bd7
Update jumanji/environments/logic/sliding_tile_puzzle/env_test.py
ElshadaiK Jul 24, 2023
165f897
fix: address PR reviews on var naming & unused method
ElshadaiK Jul 24, 2023
8b58328
Update jumanji/environments/logic/sliding_tile_puzzle/env.py
ElshadaiK Jul 31, 2023
d16187e
fix: address issues on doc
ElshadaiK Jul 31, 2023
d0f0fb5
fix: make reward external
ElshadaiK Jul 31, 2023
cd2c047
Update jumanji/environments/logic/sliding_tile_puzzle/env.py
ElshadaiK Jan 16, 2024
f8b65b9
Update jumanji/environments/logic/sliding_tile_puzzle/env.py
ElshadaiK Jan 16, 2024
cb2a1c3
Update jumanji/environments/logic/sliding_tile_puzzle/env.py
ElshadaiK Jan 16, 2024
532b3a0
Update jumanji/environments/logic/sliding_tile_puzzle/env.py
ElshadaiK Jan 16, 2024
eb39a11
Update jumanji/environments/logic/sliding_tile_puzzle/env.py
ElshadaiK Jan 16, 2024
ad03db4
Update jumanji/environments/logic/sliding_tile_puzzle/env.py
ElshadaiK Jan 16, 2024
518b2f0
Update jumanji/environments/logic/sliding_tile_puzzle/env.py
ElshadaiK Jan 16, 2024
e03c65e
fix: Add constant to constant file
ElshadaiK Jan 27, 2024
7783022
Update jumanji/environments/logic/sliding_tile_puzzle/env.py
ElshadaiK Jan 27, 2024
fc52de6
fix: bug fix for observation spec
ElshadaiK Jan 27, 2024
7e0dddc
Update jumanji/environments/logic/sliding_tile_puzzle/env.py
ElshadaiK Jan 27, 2024
8da2d8a
fix: run pre-commit to fix linter
ElshadaiK Jan 27, 2024
96f5f25
fix: env did not have a timelimit
sash-a Mar 5, 2024
dc023ad
fix: tests now working
sash-a Mar 5, 2024
6fc066b
fix: training config and conv padding
sash-a Mar 5, 2024
0bb46ad
feat: change dense reward to be positive rather than negative
sash-a Mar 5, 2024
0ad41b5
chore: many small changes and testing distance based reward
sash-a Mar 6, 2024
8a6435f
fix: update moves order
ElshadaiK Mar 7, 2024
dfce924
Update jumanji/environments/logic/sliding_tile_puzzle/reward.py
ElshadaiK Mar 7, 2024
0f06c2c
fix: use constant EMPTY_TILE in generator
ElshadaiK Mar 7, 2024
3e27e83
fix: remove unused line
ElshadaiK Mar 7, 2024
00af887
fix: Add Current Timestep to Observation
ElshadaiK Mar 7, 2024
8d38985
fix: make `solvableSTPgenerator` solvable
ElshadaiK Mar 7, 2024
e765f63
fix: fix bug in the SolvableSTPGenerator
ElshadaiK Mar 7, 2024
7e5cbcb
fix: develop solvable STP generator
ElshadaiK Mar 8, 2024
7b4ce1c
fix: make _make_random_move fn jittable
ElshadaiK Mar 8, 2024
7d6da94
fix: update how solved puzzle is computed
ElshadaiK Mar 8, 2024
a8c404d
merge
clement-bonnet Mar 8, 2024
a1c220c
feat: a2c works on 3x3
clement-bonnet Mar 8, 2024
dc1ecf8
fix: init import
clement-bonnet Mar 10, 2024
616ef6f
refactor: clem clean PR
clement-bonnet Mar 10, 2024
0defaee
chore: update stp grid size
sash-a Mar 11, 2024
84a4fe8
feat: deafult behavior wrappers
clement-bonnet Mar 8, 2024
c425c59
fix: linter
clement-bonnet Mar 8, 2024
37f8d40
fix: linters
clement-bonnet Mar 12, 2024
75480a2
chore: update train config
sash-a Mar 12, 2024
8750260
docs: add step count to obs
clement-bonnet Mar 12, 2024
1bec548
fix: viewer
sash-a Mar 13, 2024
00d6980
feat: sliding tile puzzle gif
sash-a Mar 13, 2024
e84f6ab
docs: mkdocs
clement-bonnet Mar 13, 2024
bd4f0bf
Apply suggestions from code review
clement-bonnet Mar 13, 2024
fb58be4
fix: isort
clement-bonnet Mar 13, 2024
b492701
fix: tests
clement-bonnet Mar 13, 2024
56975ec
fix: sokoban generator fixture
clement-bonnet Mar 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/api/environments/sliding_tile_puzzle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
::: jumanji.environments.logic.sliding_tile_puzzle.env.SlidingTilePuzzle
selection:
members:
- __init__
- reset
- step
- observation_spec
- action_spec
Binary file added docs/env_anim/sliding_tile_puzzle.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/env_img/sliding_tile_puzzle.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
52 changes: 52 additions & 0 deletions docs/environments/sliding_tile_puzzle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Sliding Tile Puzzle Environment

<p align="center">
<img src="../env_anim/sliding_tile_puzzle.gif" width="500"/>
</p>

This is a Jax JIT-able implementation of the classic [Sliding Tile Puzzle game](https://en.wikipedia.org/wiki/Sliding_puzzle).

The Sliding Tile Puzzle game is a classic puzzle that challenges a player to slide (typically flat) pieces along certain routes (usually on a board) to establish a certain end-configuration. The pieces to be moved may consist of simple shapes, or they may be imprinted with colors, patterns, sections of a larger picture (like a jigsaw puzzle), numbers, or letters.

The puzzle is often 3×3, 4×4 or 5×5 in size and made up of square tiles that are slid into a square base, larger than the tiles by one tile space, in a specific large configuration. Tiles are moved/arranged by sliding an adjacent tile into a position occupied by the missing tile, which creates a new space. The sliding puzzle is mechanical and requires the use of no other equipment or tools.

## Observation

The observation in the Sliding Tile Puzzle game includes information about the puzzle, the position of the empty tile, and the action mask.

- `puzzle`: jax array (int32) of shape `(grid_size, grid_size)`, representing the current game state. Each element in the array corresponds to a puzzle tile. The tile represented by 0 is the empty tile.

- Here is an example of a random observation of the game board:

```
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 0 12]
[ 13 14 15 11]]
```
- In this array, the tile represented by 0 is the empty tile that can be moved.

- `empty_tile_position`: a tuple (int32) of shape `(2,)` representing the position of the empty tile in the grid. For example, (2, 2) would represent the third row and the third column in a zero-indexed grid.

- `action_mask`: jax array (bool) of shape `(4,)`, indicating which actions are valid in the current state of the environment. The actions include moving the empty tile up, right, down, or left. For example, an action mask `[True, False, True, False]` means that the valid actions are to move the empty tile upward or downward.

- `step_count`: jax array (int32) of shape `()`, current number of steps in the episode.

## Action

The action space is a `DiscreteArray` of integer values in `[0, 1, 2, 3]`. Specifically, these four actions correspond to moving the empty tile: up (0), right (1), down (2), or left (3).

## Reward
clement-bonnet marked this conversation as resolved.
Show resolved Hide resolved

The reward could be either:

- **DenseRewardFn**: This reward function provides a dense reward based on the difference of correctly placed tiles between the current state and the next state. The reward is positive for each newly correctly placed tile and negative for each newly incorrectly placed tile.

- **SparseRewardFn**: This reward function provides a sparse reward, only rewarding when the puzzle is solved.
The reward is 1 if the puzzle is solved, and 0 otherwise.

The goal in all cases is to solve the puzzle in a way that maximizes the reward.

## Registered Versions 📖

- `SlidingTilePuzzle-v0`, the Sliding Tile Puzzle with a grid size of 5x5.
4 changes: 4 additions & 0 deletions jumanji/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,3 +134,7 @@
register(id="Sokoban-v0", entry_point="jumanji.environments:Sokoban")
# Pacman - minimal version of Atarti Pacman game
register(id="PacMan-v0", entry_point="jumanji.environments:PacMan")
# SlidingTilePuzzle - A sliding tile puzzle environment with the default grid size of 5x5.
register(
id="SlidingTilePuzzle-v0", entry_point="jumanji.environments:SlidingTilePuzzle"
)
18 changes: 13 additions & 5 deletions jumanji/environments/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,20 @@

import sys

from jumanji.environments.logic import game_2048, minesweeper, rubiks_cube
from jumanji.environments.logic import (
game_2048,
graph_coloring,
minesweeper,
rubiks_cube,
sliding_tile_puzzle,
sudoku,
)
from jumanji.environments.logic.game_2048.env import Game2048
from jumanji.environments.logic.graph_coloring.env import GraphColoring
from jumanji.environments.logic.minesweeper import Minesweeper
from jumanji.environments.logic.rubiks_cube import RubiksCube
from jumanji.environments.logic.sudoku import Sudoku
from jumanji.environments.logic.minesweeper.env import Minesweeper
from jumanji.environments.logic.rubiks_cube.env import RubiksCube
from jumanji.environments.logic.sliding_tile_puzzle.env import SlidingTilePuzzle
from jumanji.environments.logic.sudoku.env import Sudoku
from jumanji.environments.packing import bin_pack, flat_pack, job_shop, knapsack, tetris
from jumanji.environments.packing.bin_pack.env import BinPack
from jumanji.environments.packing.flat_pack.env import FlatPack
Expand All @@ -44,7 +52,7 @@
from jumanji.environments.routing.cvrp.env import CVRP
from jumanji.environments.routing.maze.env import Maze
from jumanji.environments.routing.mmst.env import MMST
from jumanji.environments.routing.multi_cvrp import MultiCVRP
from jumanji.environments.routing.multi_cvrp.env import MultiCVRP
from jumanji.environments.routing.pac_man.env import PacMan
from jumanji.environments.routing.robot_warehouse.env import RobotWarehouse
from jumanji.environments.routing.snake.env import Snake
Expand Down
16 changes: 16 additions & 0 deletions jumanji/environments/logic/sliding_tile_puzzle/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Copyright 2022 InstaDeep Ltd. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from jumanji.environments.logic.sliding_tile_puzzle.env import SlidingTilePuzzle
from jumanji.environments.logic.sliding_tile_puzzle.types import Observation, State
42 changes: 42 additions & 0 deletions jumanji/environments/logic/sliding_tile_puzzle/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Copyright 2022 InstaDeep Ltd. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import jax
import jax.numpy as jnp
import pytest

from jumanji.environments.logic.sliding_tile_puzzle import SlidingTilePuzzle
from jumanji.environments.logic.sliding_tile_puzzle.generator import RandomWalkGenerator
from jumanji.environments.logic.sliding_tile_puzzle.types import State


@pytest.fixture
def sliding_tile_puzzle() -> SlidingTilePuzzle:
"""Instantiates a default SlidingTilePuzzle environment."""
generator = RandomWalkGenerator(grid_size=3)
return SlidingTilePuzzle(generator=generator)


@pytest.fixture
def state() -> State:
key = jax.random.PRNGKey(0)
empty_pos = jnp.array([0, 0])
puzzle = jnp.array(
[
[0, 1, 3],
[4, 2, 5],
[7, 8, 6],
]
)
return State(puzzle=puzzle, empty_tile_position=empty_pos, key=key, step_count=0)
24 changes: 24 additions & 0 deletions jumanji/environments/logic/sliding_tile_puzzle/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Copyright 2022 InstaDeep Ltd. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import jax.numpy as jnp

EMPTY_TILE = 0
INITIAL_STEP_COUNT = 0

UP = [-1, 0]
RIGHT = [0, 1]
DOWN = [1, 0]
LEFT = [0, -1]

MOVES = jnp.array([UP, RIGHT, DOWN, LEFT])
Loading
Loading