Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

AWAC

1.Introduction

AWAC (advantage weighted actor critic) is an algorithm that combines sample-efficient dynamic programming with maximum likelihood policy updates, providing a simple and effective framework that is able to leverage large amounts of offline data and then quickly perform online fine-tuning of RL policies, in order to reach expert-level performance after collecting a limited amount of interaction data. The full AWAC algorithm for offline RL with online fine-tuning is summarized in Algorithm 1.

img

In a practical implementation, we can parameterize the actor and the critic by neural networks and perform SGD updates from

and

AWAC ensures data efficiency with off-policy critic estimation via bootstrapping, and avoids offline bootstrap error with a constrained actor update. By avoiding explicit modeling of the behavior policy, AWAC avoids overly conservative updates.

2. Instruction

python awac-train.py --dataset=HalfCheetah-v2 --seed=0 --gpu=0

3.Performance

Reference

[1] Nair A , Dalal M , Gupta A , et al. Accelerating Online Reinforcement Learning with Offline Datasets[J]. 2020.