This repository contains code for training an A3C agent to play Kung Fu MasterDeterministic-v0 environment in OpenAI Gym.
-
Implementation of the A3C (Asynchronous Advantage Actor-Critic) algorithm for multi-agent training.
-
Preprocessing pipeline for Kung Fu observations using the PreprocessAtari wrapper.
-
Environment batching for parallel interaction with multiple environments.
-
Evaluation of the trained agent on single episodes.
-
Video recording and visualization of the agent's gameplay.
-
Train the agent for 3000 episodes and periodically show the average agent reward during training.
Environment: https://gymnasium.farama.org/environments/atari/kung_fu_master/
video.mp4
Python 3 PyTorch NumPy OpenAI Gym tqdm
The script currently trains 10 agents in 10 parallel environments. You can modify these numbers in the number_environments and EnvBatch class. The reward scaling (batch_rewards *= 0.01) is optional and might need adjustment depending on your environment and training dynamics.