Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to resume training #27

Open
Kaixhin opened this issue Aug 12, 2018 · 5 comments
Open

Add ability to resume training #27

Kaixhin opened this issue Aug 12, 2018 · 5 comments

Comments

@Kaixhin
Copy link
Owner

Kaixhin commented Aug 12, 2018

No description provided.

@stringie
Copy link

This is very much needed as I don't have a powerful enough machine to just run once. There needs to be a save state to get back to.

@Kaixhin
Copy link
Owner Author

Kaixhin commented Sep 19, 2018

I was thinking about closing this because actually it would require saving the replay memory, which is about 7GB. Clearly it would still be a useful feature to have, so I'll leave this open in case I or someone else comes up with a nice way of serialising everything.

@guydav
Copy link
Contributor

guydav commented Sep 10, 2019

I've implemented something to this effect just by pickling the memory and loading a checkpoint. My code is a little coupled to where and how I store these saved files, but I can try to decouple it to share it, if that might be useful?

@Kaixhin
Copy link
Owner Author

Kaixhin commented Sep 10, 2019

@guydav that does sound very useful! Perhaps a --checkpoint-interval flag which if nonzero saves the checkpoint in the results directory? Resuming is the trickier part.

@guydav
Copy link
Contributor

guydav commented Sep 12, 2019

See #58 for the implementation details. I guess I now made checkpointing true by default and at the same interval as the evaluation interval, but it doesn't have to be default if you'd prefer it not to.

I think the resuming is not too hard, and I handled it through a few flags. Let me know what you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants