GymFC is an OpenAI Gym environment specifically designed for developing intelligent flight control systems using reinforcement learning. This environment is meant to serve as a tool for researchers to benchmark their controllers to progress the state-of-the art of intelligent flight control. Our tech report is available at https://arxiv.org/abs/1804.04154 providing details of the environment and benchmarking of PPO, TRPO and DDPG using OpenAI Baselines. We compare the performance results to a PID controller and find PPO to out perform PID in regards to rise time and overall error. Please use the following BibTex entry to cite our work,
@misc{1804.04154,
Author = {William Koch and Renato Mancuso and Richard West and Azer Bestavros},
Title = {Reinforcement Learning for UAV Attitude Control},
Year = {2018},
Eprint = {arXiv:1804.04154},
}
Note, Ubuntu is the only OS currently supported. I welcome any PRs and feedback for getting it installed on other OSs.
- Download and install Gazebo 8 (PRs welcome for Gazebo 9). Note the one-liner install script has been updated to install Gazebo 9. Find the install script for Gazebo 8 is here. This is the recommended way to install the simulator. Tested on Ubuntu 16.04 LTS.
- From root directory of this project,
pip3 install -e .
To verify you have installed the environment correctly it is recommended to run
the supplied PID controller controlling an Iris quadcopter model. This example
uses the configuration file examples/config/iris.config
. Before running the
example verify the configuration, specifically that the Gazebo SetupFile
is pointing to the correct location.
To run the example change directories to examples/controllers
and execute,
python3 run_iris_pid.py
If your environment is installed properly you should observe a plot that closely resembles this step response,
GymFC's primary goal is to train controllers capable of flight in the real-world.
In order to construct optimal flight controllers, the aircraft used in
simulation should closely match the real-world aircraft. Therefore the GymFC environment is decoupled from the simulated aircraft.
As previously mentioned, GymFC comes with an example to verify the environment.
The Iris model can be useful for testing out new controllers. However when
transferring the controller to run on a different aircraft, a new model will be
required. Once the model is developed set the model directory to AircraftModel
in your configuration file.
It is recommended to run GymFC in headless mode (i.e. using gzserver
) however
during development and testing it may be desired to visually see the aircraft. You can do this by using the render
OpenAI gym API call which will also start gzclient
along side gzserver
. For example when creating the environment use,
env = gym.make(env_id)
env.render()
Different environments are available depending on the capabilities of the flight control system. For example new ESCs contain sensors to provide telemetry including the velocity of the rotor which can be used as additional state in the environment. Environment naming format is [prefix]_[inputs]_M[actuator count]_[task type] where prefix=AttFC, Ep is episodic tasks, and Con is continuous tasks.
This environment is an episodic task to learn attitude control of a quadcopter. At the beginning of each episode the quadcopter is at rest. A random angular velocity is sampled and the agent must achieve this target within 1 second.
Observation Space Box(7,) where 3 observations correspond to the angular velocity error for each axis in radians/second (i.e Ω* − Ω) in range [-inf, inf] and 4 observations correspond to the angular velocity of each rotor in range [-inf, inf].
Action Space Box(4,) corresponding to each PWM value to be sent to the ESC in the range [-1, 1].
Reward The error normalized between [-1, 0] representing how close the angular velocity is to the target calculated by -clip(sum(|Ω* − Ω |)/3Ω_max) where the clip function bounds the result to [-1, 0] and Ω_max is the initially error from when the target angular velocity is set.
Note: In the referenced paper different memory sizes were tested, however for PPO it was found additional memory did not help. At the moment for research, debugging and testing purposes environments with different memory sizes are included and can be referenced by AttFC_GyroErr1-MotorVel_M4_Ep-v0 - AttFC_GyroErr10-MotorVel_M4_Ep-v0.
This environment is essentially the same as the episodic variant however it runs for 60 seconds and continually changes the target angular velocities randomly between [0.1, 1] seconds.
This environment supports ESCs without telemetry and only relies on the gyro readings as environment observations. Preliminary testing has shown memory > 1 increases accuracy.