Skip to content
forked from wil3/gymfc

An OpenAI environment for developing neuro-flight controllers using reinforcement learning

License

Notifications You must be signed in to change notification settings

shaopengyuan/gymfc

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GymFC

GymFC is an OpenAI Gym environment specifically designed for developing intelligent flight control systems using reinforcement learning. This environment is meant to serve as a tool for researchers to benchmark their controllers to progress the state-of-the art of intelligent flight control. Our tech report is available at https://arxiv.org/abs/1804.04154 providing details of the environment and benchmarking of PPO, TRPO and DDPG using OpenAI Baselines. We compare the performance results to a PID controller and find PPO to out perform PID in regards to rise time and overall error. Please use the following BibTex entry to cite our work,

@misc{1804.04154,
Author = {William Koch and Renato Mancuso and Richard West and Azer Bestavros},
Title = {Reinforcement Learning for UAV Attitude Control},
Year = {2018},
Eprint = {arXiv:1804.04154},
}

Installation

Note, Ubuntu is the only OS currently supported. I welcome any PRs and feedback for getting it installed on other OSs.

  1. Download and install Gazebo 8 (PRs welcome for Gazebo 9). Note the one-liner install script has been updated to install Gazebo 9. Find the install script for Gazebo 8 is here. This is the recommended way to install the simulator. Tested on Ubuntu 16.04 LTS.
  2. From root directory of this project, pip3 install -e .

Iris PID Example

To verify you have installed the environment correctly it is recommended to run the supplied PID controller controlling an Iris quadcopter model. This example uses the configuration file examples/config/iris.config. Before running the example verify the configuration, specifically that the Gazebo SetupFile is pointing to the correct location. To run the example change directories to examples/controllers and execute,

python3 run_iris_pid.py

If your environment is installed properly you should observe a plot that closely resembles this step response, PID Step Response

Development

GymFC's primary goal is to train controllers capable of flight in the real-world. In order to construct optimal flight controllers, the aircraft used in simulation should closely match the real-world aircraft. Therefore the GymFC environment is decoupled from the simulated aircraft. As previously mentioned, GymFC comes with an example to verify the environment. The Iris model can be useful for testing out new controllers. However when transferring the controller to run on a different aircraft, a new model will be required. Once the model is developed set the model directory to AircraftModel in your configuration file.

It is recommended to run GymFC in headless mode (i.e. using gzserver) however during development and testing it may be desired to visually see the aircraft. You can do this by using the render OpenAI gym API call which will also start gzclient along side gzserver. For example when creating the environment use,

env = gym.make(env_id)
env.render()

GymFC Visualization Demo

Environments

Different environments are available depending on the capabilities of the flight control system. For example new ESCs contain sensors to provide telemetry including the velocity of the rotor which can be used as additional state in the environment. Environment naming format is [prefix]_[inputs]_M[actuator count]_[task type] where prefix=AttFC, Ep is episodic tasks, and Con is continuous tasks.

AttFC_GyroErr-MotorVel_M4_Ep-v0

This environment is an episodic task to learn attitude control of a quadcopter. At the beginning of each episode the quadcopter is at rest. A random angular velocity is sampled and the agent must achieve this target within 1 second.

Observation Space Box(7,) where 3 observations correspond to the angular velocity error for each axis in radians/second (i.e Ω* − Ω) in range [-inf, inf] and 4 observations correspond to the angular velocity of each rotor in range [-inf, inf].

Action Space Box(4,) corresponding to each PWM value to be sent to the ESC in the range [-1, 1].

Reward The error normalized between [-1, 0] representing how close the angular velocity is to the target calculated by -clip(sum(|Ω* − Ω |)/3Ω_max) where the clip function bounds the result to [-1, 0] and Ω_max is the initially error from when the target angular velocity is set.

Note: In the referenced paper different memory sizes were tested, however for PPO it was found additional memory did not help. At the moment for research, debugging and testing purposes environments with different memory sizes are included and can be referenced by AttFC_GyroErr1-MotorVel_M4_Ep-v0 - AttFC_GyroErr10-MotorVel_M4_Ep-v0.

AttFC_GyroErr-MotorVel_M4_Con-v0

This environment is essentially the same as the episodic variant however it runs for 60 seconds and continually changes the target angular velocities randomly between [0.1, 1] seconds.

AttFC_GyroErr1_M4_Ep-v0 - AttFC_GyroErr10_M4_Ep-v0

This environment supports ESCs without telemetry and only relies on the gyro readings as environment observations. Preliminary testing has shown memory > 1 increases accuracy.

About

An OpenAI environment for developing neuro-flight controllers using reinforcement learning

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 58.1%
  • Python 40.8%
  • Other 1.1%