This repository contains code for training an A3C agent to play Kung Fu MasterDeterministic-v0 environment in OpenAI Gym.
Implementation of the A3C (Asynchronous Advantage Actor-Critic) algorithm for multi-agent training.
Preprocessing pipeline for Kung Fu observations using the PreprocessAtari wrapper.
Environment batching for parallel interaction with multiple environments.
Evaluation of the trained agent on single episodes.
Video recording and visualization of the agent's gameplay.
Train the agent for 3000 episodes and periodically show the average agent reward during training.
Python 3 PyTorch NumPy OpenAI Gym tqdm
The script currently trains 10 agents in 10 parallel environments. You can modify these numbers in the number_environments and EnvBatch class. The reward scaling (batch_rewards *= 0.01) is optional and might need adjustment depending on your environment and training dynamics.