OpenAI Gym environments for a quadrotor UAV control, combined with modular RL and Equivariant RL approaches.
NOTE: For training code presented at the ACC 2023 conference, Equivariant reinforcement learning for quadrotor UAV, please switch to the
ACC23
branch. For the ACC 2024 conference training code, Multi-Agent Reinforcement Learning for the Low-Level Control of a Quadrotor UAV, use theACC24
branch. After cloning the desired branch, please refer to the respective README.md for detailed instructions.
This repository contains OpenAI Gym environments and PyTorch implementations of PPO, SAC, and TD3/MATD3, for low-level control of quadrotor unmanned aerial vehicles. To better understand What Deep RL Do, see OpenAI Spinning UP. Please don't hesitate to create new issues or pull requests for any suggestions and corrections.
The repository is compatible with Python 3.11.3, Gymnasium 0.28.1, Pytorch 2.0.1, and Numpy 1.25.1. It is recommended to create Anaconda environment with Python 3 (installation guide). Additionally, Visual Studio Code is recommended for efficient code editing.
- Open a terminal and install major packages.
conda install -c conda-forge gymnasium
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
conda install -c anaconda numpy
conda install -c conda-forge vpython
Refer to the official documentation for Gymnasium, Pytorch, and Numpy, and Vpython.
- Clone the repository.
git clone https://github.com/fdcl-gwu/gym-rotor.git
Consider a quadrotor UAV below,
The position and the velocity of the quadrotor are represented by
Two major training frameworks are provided for quadrotor low-level control tasks: (a) In a monolithic setting, a large end-to-end policy directly outputs total thrust and moments; (b) In modular frameworks, two modules collaborate to control the quadrotor: Module #1 and Module #2 specialize in translational control and yaw control, respectively. A detailed explanation of these concepts can be found in this YouTube video.
Env IDs | Description |
---|---|
Quad-v0 |
This serves as the foundational environment for wrappers, where the state and action are represented as |
CoupledWrapper |
Wrapper for monolithic RL framework: the observation and action are given by |
DecoupledWrapper |
Wrapper for modular RL schemes: the observation and action for each agent are defined as |
where the error terms
We evaluate monolithic and modular equivariant RL frameworks against their non-equivariant counterparts.
Specifically, we benchmark four distinct frameworks: two traditional non-equivariant approaches using standard multi-layer perceptrons (MLPs), namely Mono-MLP (Monolithic Policy with MLP) and Mod-MLP (Modular Policies with MLP), and two proposed equivariant approaches utilizing equivariant multilayer perceptrons (EMLPs), referred to as Mono-EMLP (Monolithic Policy with EMLP) and Mod-EMLP (Modular Policies with EMLP).
Additionally, we explore two inter-module communication strategies to optimize coordination across modules: centralized (--module_training CTDE
) and decentralized (--module_training DTDE
) coordination.
In the decentralized setting, namely DTDE
, modules independently learn their action value functions and policies without inter-agent synchronization.
In contrast, centralized methods, called CTDE
, introduce centralized critic networks to share information between modules during training.
Lastly, we propose data-efficient, equivariant monolithic and modular RL by identifying the rotational and reflectional symmetries in quadrotor dynamics and encoding these symmetries into equivariant network models (--use_equiv True
).
Hyperparameters can be adjusted in args_parse.py
to further optimize performance.
Use the following commands to train agents with different frameworks and equivariance settings:
# Mono-MLP (Monolithic Policy with MLP)
python3 main.py --framework MONO --use_equiv False
# Mod-MLP (Modular Policies with MLP)
python3 main.py --framework MODUL --use_equiv False
# Mono-EMLP (Monolithic Policy with EMLP)
python3 main.py --framework MONO --use_equiv True
# Mod-EMLP (Modular Policies with EMLP)
python3 main.py --framework MODUL --use_equiv True
The trained models are saved in the models
folder (e.g. TD3_MONO_700.0k_steps_agent_0_1992
).
To evaluate them, first, modify total_steps
value in main.py
, e.g., for MONO schemes trained with TD3,
# Load trained models for evaluation:
if self.args.eval_model:
if self.framework == "MONO":
total_steps, agent_id = 700_000, 0 # edit 'total_steps' accordingly
self.agent_n[agent_id].load(self.rl_algo, self.framework, total_steps, agent_id, self.seed)
Then, execute the following command to test, visualize, and log flight data:
python3 main.py --framework MONO --test_model True --save_log True --render True --seed 1992
While testing the trained models, you can save the flight data by adding the --save_log True
flag.
Then the data will be saved in the results
folder with a timestamped filename, such as MODUL_log_20250303_120200.dat
.
To visualize the flight data, open draw_plot.py
and update the file_name
accordingly, i.e., file_name = 'MODUL_log_20250303_120200'
.
Lastly, run the plotting script with the appropriate framework flag; For MODUL frameworks:
python3 draw_plot.py --framework MODUL
For MONO frameworks:
python3 draw_plot.py --framework MONO
The experiments demonstrate clear advantages of equivariant over non-equivariant approaches and modular over monolithic architectures in reinforcement learning for quadrotor UAV control. Equivariant models, Mod-EMLP (red) and Mono-EMLP (blue), consistently converge faster and achieve higher average returns compared to their non-equivariant counterparts, Mod-MLP (orange) and Mono-MLP (green). Particularly, the Mod-EMLP model, combining equivariant learning with a modular architecture, shows superior early-stage performance and faster convergence due to parallel learning of translational and yawing motions. In contrast, monolithic architectures, particularly those employing non-equivariant models, exhibit slower learning and risk overfitting due to the complexity of simultaneously optimizing all control aspects. A supplementary video is available here.
If you find this work useful in your work and would like to cite it, please give credit to our work:
@article{yu2025equivariant,
title={Equivariant Reinforcement Learning Frameworks for Quadrotor Low-Level Control},
author={Yu, Beomyeol and Lee, Taeyoung},
journal={arXiv preprint arXiv:2502.20500},
year={2025}}
@article{yu2024modular,
title={Modular Reinforcement Learning for a Quadrotor UAV with Decoupled Yaw Control},
author={Yu, Beomyeol and Lee, Taeyoung},
journal={IEEE Robotics and Automation Letters},
year={2024},
publisher={IEEE}}
- reinmav-gym: Reinforcement Learning framework for MAVs using the OpenAI Gym environment
- PPO-Continuous-Pytorch: A clean and robust Pytorch implementation of PPO on continuous action space.
- pytorch-soft-actor-critic: PyTorch implementation of soft actor critic
- TD3: Author's PyTorch implementation of TD3 for OpenAI gym tasks
- MARL-code-pytorch: Concise pytorch implements of MARL algorithms, including MAPPO, MADDPG, MATD3, QMIX and VDN.