Skip to content

Parameter Averaging in Reinforcement Learning (PARL). Inspired by OpenAI requests for research 2.0

Notifications You must be signed in to change notification settings

FibonacciDude/PaRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Research project

Inspired by OpenAI’s request for research 2.0, I decided to explore the effect of averaging the parameters of multiple parallel workers in reinforcement learning. To test this, I coded a PPO implementation (with elements from John Schulman’s code and OpenAI’s SpinningUp code) that instead of averaging the gradients at each step, took multiple steps and averaged the parameters of the models in different reinforcement learning environments.

The model converged faster to optimal behavior in the environment in reward per communication, but equally as the average gradient (instead of parameter) model in reward per step. It reduced the communication of parallel workers while still keeping the same performance as the baseline (from my analysis).

Link to request for research 2.0: https://openai.com/index/requests-for-research-2/

About

Parameter Averaging in Reinforcement Learning (PARL). Inspired by OpenAI requests for research 2.0

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published