Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite PCL agent #245

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Rewrite PCL agent #245

wants to merge 15 commits into from

Conversation

lyx-x
Copy link
Contributor

@lyx-x lyx-x commented Feb 25, 2018

I rewrote the PCL agent to avoid memory issues when saving Variables inside list / replay buffer. I didn't compare the training curve with the old one, but it seems to learn (the average_value increases and R gets bigger) on Catpole under the new parameters and there is no memory issue when run with large network / reasonably long trajectories.

Main methods are the following:

update: take a loss (as an array), log the result as usual and call optimizer (the backprop is done before this function is called)
update_on_policy and update_from_replay: sample a list of trajectories (from replay or the current one), clear grads and compute loss
compute_loss: take a list of trajectories, perform batch computation (batch size is the number of episodes, which may not be efficient when there is one single episode for on-policy update). This function will call backward immediately and only return an array for logging
_compute_path_consistency: compute path consistency, this part of code is almost unchanged

The new underlying data structure is a list of dict to store the current episode, then a replay buffer that only stores (s,a,r) pairs. The old mu (action_distrib) is removed since it can be recomputed again from other items.

I also added a unified model in the example script and changed a couple of parameters.

Issues addressed: #109 #236 #240

I am not sure if the parameters are used correctly, but if they are correct, this PR also addresses #238

@muupan
Copy link
Member

muupan commented Mar 15, 2018

Thank you for the improvements on PCL. I haven't checked the implementation details yet, but I think solving the memory issue is great as long as it won't make training slow.

Can you show the training curves and computation speeds before and after this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants