GitHub - FibonacciDude/PaRL: Parameter Averaging in Reinforcement Learning (PARL). Inspired by OpenAI requests for research 2.0

Research project

Inspired by OpenAI’s request for research 2.0, I decided to explore the effect of averaging the parameters of multiple parallel workers in reinforcement learning. To test this, I coded a PPO implementation (with elements from John Schulman’s code and OpenAI’s SpinningUp code) that instead of averaging the gradients at each step, took multiple steps and averaged the parameters of the models in different reinforcement learning environments.

The model converged faster to optimal behavior in the environment in reward per communication, but equally as the average gradient (instead of parameter) model in reward per step. It reduced the communication of parallel workers while still keeping the same performance as the baseline (from my analysis).

Link to request for research 2.0: https://openai.com/index/requests-for-research-2/

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
__pycache__		__pycache__
runs		runs
README.md		README.md
ac.py		ac.py
cc.sh		cc.sh
core.py		core.py
launch.py		launch.py
mpi_tools.py		mpi_tools.py
ppo.py		ppo.py
rollout.py		rollout.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Research project

About

Releases

Packages

Languages

FibonacciDude/PaRL

Folders and files

Latest commit

History

Repository files navigation

Research project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages