-
Notifications
You must be signed in to change notification settings - Fork 88
Add Env for playing Pokémon using poke-env #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Added observations on popular choices and practices in Pokemon-RL projects.
|
✅ Deployment succeeded for
Nice work! Wait for a code review and we're ready to go. You can iterate locally or validate fixes by running |
Add observations section to Readme
|
✅ Deployment succeeded for
Nice work! Wait for a code review and we're ready to go. You can iterate locally or validate fixes by running |
- Introduced a comprehensive tutorial for using the OpenEnv Pokemon Battle Environment. - Included sections on setup, battle mechanics, observations, available moves, and actions. - Demonstrated a complete battle simulation and a simple strategy agent. - Ensured proper connection to the environment and handling of battle states.
- Created a Jupyter notebook tutorial for the Pokemon Battle Environment, covering setup, battle mechanics, and agent strategies. - Added a shell script to test the Pokemon environment Docker image, ensuring services are running and performing basic environment interactions.
Updated Pokemon_Env & added a test notebook
|
✅ Deployment succeeded for
Nice work! Wait for a code review and we're ready to go. You can iterate locally or validate fixes by running |
Added system requirements, hardware recommendations, setup steps, and common issues with fixes for the Basic Pokemon RL Training environment.
…sicSetupCheckList.md
Updated the checklist for setting up the Pokémon RL training environment, including system requirements, hardware recommendations, setup instructions, and known issues.
Removed unnecessary comments and cleaned up formatting.
Added instructions for forking the Pokemon-Showdown repository and noted compatibility issues with random battle generation.
Organised Dockerfile
fix: ensure Gen 8 Random Battles run correctly instead of Gen 9
|
✅ Deployment succeeded for
Nice work! Wait for a code review and we're ready to go. You can iterate locally or validate fixes by running |
1 similar comment
|
✅ Deployment succeeded for
Nice work! Wait for a code review and we're ready to go. You can iterate locally or validate fixes by running |
Created a Basic Setup and Installation.md file
|
✅ Deployment succeeded for
Nice work! Wait for a code review and we're ready to go. You can iterate locally or validate fixes by running |
dockerfiles separated
|
✅ Deployment succeeded for
Nice work! Wait for a code review and we're ready to go. You can iterate locally or validate fixes by running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a comprehensive Pokemon battle environment integration to OpenEnv using poke-env and Pokemon Showdown. The implementation provides HTTP-based access to Pokemon battles for reinforcement learning and AI training.
- Complete Pokemon environment integration with poke-env library
- Docker containerization with Pokemon Showdown server and OpenEnv API server
- Client library for HTTP-based environment interaction
- Comprehensive test suite and documentation
Reviewed Changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 24 comments.
Show a summary per file
| File | Description |
|---|---|
| src/envs/pokemon_env/server/pokemon_environment.py | Core environment implementation with async/event loop synchronization |
| src/envs/pokemon_env/client.py | HTTP client for Pokemon environment |
| src/envs/pokemon_env/models.py | Data models for actions, observations, and state |
| src/envs/pokemon_env/server/app.py | FastAPI server application |
| src/envs/pokemon_env/server/Dockerfile | Multi-service Docker image configuration |
| src/envs/pokemon_env/test_enhancements.py | Comprehensive test suite |
| examples/project-pikachu/test_*.py | Integration and HTTP client tests |
| examples/project-pikachu/TESTING.md | Testing guide and troubleshooting |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if obs.available_switches and random.random() < 0.2 and active_hp < 0.3: | ||
| switch_idx = random.choice(range(len(obs.available_switches))) | ||
| action = PokemonAction(action_type="switch", action_index=switch_idx) | ||
| action_desc = f"Switch to {obs.team[switch_idx + 1].species.title()}" |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array index mismatch: switch_idx is selected from available_switches (line 498), but used to index obs.team[switch_idx + 1] (line 500). The available_switches list contains indices into the team, not raw team indices. This should be obs.team[obs.available_switches[switch_idx]] to correctly map from the chosen switch index to the actual team member.
| action_desc = f"Switch to {obs.team[switch_idx + 1].species.title()}" | |
| action_desc = f"Switch to {obs.team[obs.available_switches[switch_idx]].species.title()}" |
| dynamax: Whether to dynamax this turn (if applicable) | ||
| terastallize: Whether to terastallize this turn (if applicable) | ||
| """ | ||
| action_type: Literal["move", "switch"] = "move" |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The action_type field doesn't include 'forfeit' or 'default' as valid options, but the pokemon_environment.py implementation handles these action types (lines 142, 193). Consider adding these to the Literal type for consistency: Literal[\"move\", \"switch\", \"forfeit\", \"default\"].
| action_type: Literal["move", "switch"] = "move" | |
| action_type: Literal["move", "switch", "forfeit", "default"] = "move" |
| except ImportError as e: | ||
| raise ImportError( | ||
| "poke-env is not installed. " | ||
| "Please install it with: pip install poke-env" |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected command formatting. The installation command should use backticks for code formatting consistency: 'Please install it with: pip install poke-env'
| "Please install it with: pip install poke-env" | |
| "Please install it with: `pip install poke-env`" |
| # Otherwise, use a move | ||
| elif obs.available_moves: | ||
| move_idx = random.choice(range(len(obs.available_moves))) | ||
| move_name = obs.available_moves[move_idx] |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obs.available_moves is a list of integers (move indices, not move names) based on the PokemonObservation model definition. Using it directly as move_name will result in displaying an integer instead of the actual move name. To get the move name, use: obs.active_pokemon.moves[obs.available_moves[move_idx]]['id'] if available.
| move_name = obs.available_moves[move_idx] | |
| move_index = obs.available_moves[move_idx] | |
| move_name = obs.active_pokemon.moves[move_index]['id'] if obs.active_pokemon and obs.active_pokemon.moves and move_index < len(obs.active_pokemon.moves) else str(move_index) |
| mega_evolve: bool = False | ||
| dynamax: bool = False | ||
| terastallize: bool = False |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing documentation for the z_move field. While mega_evolve, dynamax, and terastallize are documented in the docstring, z_move is referenced in pokemon_environment.py but not included in the PokemonAction dataclass or its docstring. Either add the field or remove references to it in the implementation.
| """ | ||
|
|
||
| import sys | ||
| import os |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'os' is not used.
| import os |
| import sys | ||
| import os | ||
| import argparse | ||
| import time |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'time' is not used.
| import time |
| """ | ||
|
|
||
| import sys | ||
| import os |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'os' is not used.
| import os |
|
|
||
| import sys | ||
| import os | ||
| import time |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'time' is not used.
| import time |
| sys.path.insert(0, str(project_root / "src")) | ||
|
|
||
| # Import models and environment | ||
| from envs.pokemon_env.models import PokemonAction, PokemonObservation, PokemonState |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'PokemonObservation' is not used.
Import of 'PokemonState' is not used.
| from envs.pokemon_env.models import PokemonAction, PokemonObservation, PokemonState | |
| from envs.pokemon_env.models import PokemonAction |
Darktex
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving my manual review here and claude review coming next...
| action_index: int = 0 | ||
| move_id: Optional[str] = None | ||
| switch_pokemon: Optional[str] = None | ||
| mega_evolve: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these be inside Action type then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or like, these should be "can_mega_evolve", no?
| ability: Optional[str] | ||
| item: Optional[str] | ||
|
|
||
| attack: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't get to see the opponent pokemon's stats ever though
| active_pokemon: Optional[PokemonData] = None | ||
| opponent_active_pokemon: Optional[PokemonData] = None | ||
| team: List[PokemonData] = field(default_factory=list) | ||
| opponent_team: List[PokemonData] = field(default_factory=list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need a placeholder for opponent pokemon that you have not yet seen. eg at the very start of the battle you see that the other guy has 6 pokemon, then they launch the first and you see that he has 5 more but you don't know which ones
| turn: int = 0 | ||
| forced_switch: bool = False | ||
|
|
||
| can_mega_evolve: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are already there in the PokemonData of the active pokemon
| def _compute_reward(self, battle, done: bool) -> float: | ||
| """Compute reward based on reward_mode.""" | ||
|
|
||
| if self.reward_mode == "sparse": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not do magic numbers here, these should be configurable
| # Final outcome bonus | ||
| if done: | ||
| if battle.won: | ||
| reward += 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(You probably want something more like 10 for this)
| self._last_player_fainted = player_fainted | ||
|
|
||
| # Small reward for opponent HP damage | ||
| if battle.opponent_active_pokemon and hasattr(battle.opponent_active_pokemon, 'current_hp_fraction'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should likely add a reward for statuses. Paralyzed, Toxin, Spikes, Leech Seed etc.
There should be another reward for REMOVING statuses. What made Blissey super strong was that she could learn Aromatherapy which removes status debuffs on your entire team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also add a reward for buffs which are a big deal in this game (we banned using Curse on Snorlax back in the day because he became ridiculously strong). Marowak is one Swords Dance away from 999 Attack which one-shots most pokemon in the game. Buffs are heavily quantized, eg Dragon Dance is +1 Atk +1 Spd while Swords Dance is +2 Atk and Belly Drum is +4 Atk (but you lose 50% hp!). Each +1 is +50%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, there should be a reward for cleaning buffs. Skarmory was played for this reason: he could Whirlwind out your pokemon (buffs do not survive a switch unlike statuses)
| # Small reward for opponent HP damage | ||
| if battle.opponent_active_pokemon and hasattr(battle.opponent_active_pokemon, 'current_hp_fraction'): | ||
| current_hp = battle.opponent_active_pokemon.current_hp_fraction | ||
| if current_hp is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also HP are often recovered. Moves like Recover, etc. There should be another reward for recovery of HP
Darktex
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
This PR adds a comprehensive Pokemon battle environment integration using poke-env and Pokemon Showdown. The implementation demonstrates excellent async synchronization, proper memory management, and strong test coverage. The architecture follows OpenEnv patterns well.
Highlights ✅
- Excellent async event loop bridging between FastAPI and poke-env's POKE_LOOP
- Memory leak prevention with automatic battle cleanup
- Comprehensive error handling with fallback to legal actions
- Dual reward modes (sparse/dense) for different RL approaches
- Strong test coverage (needs relocation)
- Thread-safe with proper locking
Critical Issues 🔴
Two blocking issues must be fixed before merge:
- Missing Copyright Headers: All Python files in
src/envs/pokemon_env/need Meta copyright headers - Tests in Wrong Location: Tests should be in
tests/envs/pokemon_env/, notexamples/
Important Issues 🟡
- Test Suite Integration: Tests need proper pytest structure and CI/CD integration
- Dockerfile Organization: Multiple Dockerfiles need clarification (see proposal below)
- Hard-coded Port: Dockerfile should respect PORT environment variable
- Error Messages: Add context to observation metadata for better debugging
- Documentation Cleanup: Remove development artifacts from examples (see proposal below)
Minor Issues 🟢
Issues #8-12 are accepted as minor improvements for future consideration.
Proposals
Proposal for Issue #4: Dockerfile Organization
Current situation: Three Dockerfiles exist:
Dockerfile- Combines both Showdown + OpenEnv with supervisor ✅Dockerfile.showdown- Standalone ShowdownDockerfile.pokemonenv- Standalone OpenEnv
Recommendation: Keep the main Dockerfile (it's well-designed) and remove Dockerfile.showdown and Dockerfile.pokemonenv because:
- The main Dockerfile already handles everything with multi-stage build
- Having multiple Dockerfiles creates confusion about which to use
- The README should document the single, consolidated approach
- Separate containers are rarely needed for this use case (Showdown must run alongside the env)
Alternative (if separate builds are truly needed): Keep all three but:
- Rename them clearly:
Dockerfile,Dockerfile.showdown-standalone,Dockerfile.env-only - Add clear documentation in README explaining when to use each
- Add a decision tree: "Use main Dockerfile unless you need X"
My recommendation: Remove the extra Dockerfiles. The all-in-one approach is simpler and matches the architecture (two services that must communicate).
Proposal for Issue #7: Documentation Cleanup
Current state: examples/project-pikachu/ contains:
test_local_pokemon.py(move to tests) ✓test_http_pokemon.py(move to tests) ✓BasicSetupCheckList.md- Development artifact ❌IMPLEMENTATION_SUMMARY.md- Development artifact ❌Poke-Env.MD- Library documentation (external) ❌Readme.MD- Duplicate of main README ❌TESTING.md- Internal testing guide ❌
Recommendation: Remove entire examples/project-pikachu/ directory and instead:
- Create
examples/pokemon_env_example.pywith a simple, documented example:
#!/usr/bin/env python3
"""
Simple Pokemon Battle Environment Example.
This example shows how to use the Pokemon environment with OpenEnv.
Prerequisites:
docker run -p 8000:8000 -p 9980:9980 pokemon-env:latest
Usage:
python examples/pokemon_env_example.py
"""
from envs.pokemon_env import PokemonEnv, PokemonAction
import random
def main():
# Connect to server
client = PokemonEnv(base_url="http://localhost:9980")
# Reset environment
result = client.reset()
print(f"Battle started! Active Pokemon: {result.observation.active_pokemon.species}")
# Battle loop
step = 0
while not result.done and step < 100:
# Choose random legal action
if result.observation.available_moves:
action = PokemonAction(
action_type="move",
action_index=random.choice(result.observation.available_moves)
)
elif result.observation.available_switches:
action = PokemonAction(
action_type="switch",
action_index=random.choice(result.observation.available_switches)
)
else:
break
result = client.step(action)
step += 1
if step % 10 == 0:
print(f"Turn {step}: Reward={result.reward:.2f}")
print(f"\nBattle finished in {step} turns")
print(f"Final reward: {result.reward}")
state = client.state()
print(f"Winner: {state.battle_winner}")
if __name__ == "__main__":
main()-
Merge useful content into main README:
- Architecture details from IMPLEMENTATION_SUMMARY.md
- Any unique troubleshooting from TESTING.md
-
Benefits:
- Single source of truth (README.md)
- Clear, runnable example for users
- No confusion about what's documentation vs development artifacts
- Cleaner repository structure
Overall: This is a high-quality implementation. Once copyright headers are added and tests are relocated, this will be an excellent addition to OpenEnv! 🎉
| @@ -0,0 +1,127 @@ | |||
| """ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 CRITICAL: Missing Copyright Header
All Python files in the OpenEnv repository must include the Meta copyright header. This file and all other Python files in src/envs/pokemon_env/ are missing this required header.
Required header:
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.Files needing headers:
models.pyclient.py__init__.pyserver/app.pyserver/pokemon_environment.pyserver/__init__.py
Please add this header to the top of each file before the module docstring.
| @@ -0,0 +1,310 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 CRITICAL: Tests in Wrong Location
Tests should be located in tests/envs/pokemon_env/, not in examples/. The examples directory is for user-facing demonstration code, not test suites.
Required changes:
- Create
tests/envs/pokemon_env/directory - Move
test_local_pokemon.py→tests/envs/pokemon_env/test_environment.py - Move
test_http_pokemon.py→tests/envs/pokemon_env/test_client.py - Remove manual
sys.pathmanipulation (tests should use proper imports) - Follow pytest conventions and integrate with existing test infrastructure
Reference: See tests/envs/atari_env/ or tests/envs/echo_env/ for proper structure.
|
|
||
| # Validate battle state | ||
| if self._current_battle.finished: | ||
| logger.warning("Step called on finished battle, returning final state") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 IMPORTANT: Incomplete Error Messages
When step() is called on a finished battle, the warning is logged but not exposed to the user in the observation.
Improvement:
logger.warning("Step called on finished battle, returning final state")
obs = self._battle_to_observation(self._current_battle, reward=None, done=True)
obs.metadata["warning"] = "Step called on already finished battle"
return obsThis helps users debug issues when they accidentally call step after the battle ends.
| \n\ | ||
| [program:openenv]\n\ | ||
| command=uvicorn envs.pokemon_env.server.app:app --host 0.0.0.0 --port 9980\n\ | ||
| directory=/app\n\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 IMPORTANT: Hard-coded Port
The supervisor config hard-codes port 9980. This should respect the PORT environment variable for deployment flexibility.
Fix:
[program:openenv]
command=bash -c "uvicorn envs.pokemon_env.server.app:app --host 0.0.0.0 --port ${PORT:-9980}"
directory=/app
environment=PYTHONPATH="/app/src",PORT="${PORT:-9980}"This allows deployments to override the port via environment variable while keeping 9980 as the default.
| self._episodes_completed = 0 | ||
| self._cleanup_interval = 10 # Clean up every 10 episodes | ||
|
|
||
| def _pokemon_to_data(self, pokemon) -> Optional[PokemonData]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟢 MINOR: Type Annotation
The pokemon parameter could have a type hint for better documentation:
from typing import Any
def _pokemon_to_data(self, pokemon: Any) -> Optional[PokemonData]:Since this comes from the external poke-env library, Any is appropriate here.
|
|
||
| # Battle history cleanup interval | ||
| self._episodes_completed = 0 | ||
| self._cleanup_interval = 10 # Clean up every 10 episodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟢 MINOR: Magic Number
The cleanup interval is hard-coded. Consider making it configurable:
def __init__(
self,
battle_format: str = "gen8randombattle",
player_username: Optional[str] = None,
opponent: Optional[Player] = None,
reward_mode: str = "sparse",
max_turns: int = 1000,
cleanup_interval: int = 10, # Add this parameter
):
...
self._cleanup_interval = cleanup_intervalThis allows fine-tuning in production environments if needed.
|
|
||
| ```bash | ||
| # Build both images (run from project root directory) | ||
| docker build -t pokemon-showdown:latest -f src/envs/pokemon_env/server/Dockerfile.showdown . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 IMPORTANT: Dockerfile Documentation
The README mentions building Dockerfile.showdown and Dockerfile.pokemonenv separately, but the main Dockerfile appears to handle everything with supervisor.
Please clarify:
- Which Dockerfiles are actually needed?
- When to use each one?
See the proposal in the main review for a recommendation to simplify this.
This idea came super thanks to @cpich3g from our Hackathon.
We are continuing here but thanks again Justin for leading and inspiring!
Adds a complete Pokemon battle environment integration using poke-env and Pokemon Showdown, following OpenEnv's HTTP-based framework patterns.
Design Decisions
Event Loop Management
POKE_LOOPbackground threadasyncio.run_coroutine_threadsafe()asyncio.Eventobjects on correct loopMemory Management
Concurrency
reset()andstep()operationsError Handling
File Structure
src/envs/pokemon_env/
├── init.py # Package exports
├── models.py # Action, Observation, State dataclasses
├── client.py # HTTPEnvClient implementation
├── README.md # Environment documentation
└── server/
├── init.py
├── app.py # FastAPI application
├── pokemon_environment.py # Core environment logic
├── Dockerfile # Container with Showdown + OpenEnv
└── requirements.txt # Dependencies
examples/project-pikachu/
├── test_local_pokemon.py # Direct environment tests (6 tests)
├── test_http_pokemon.py # HTTP client tests (6 tests)
└── [documentation files] # Testing guides and architecture docs
Reward Modes
Sparse (default):
Dense:
Credits