-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature dataset checkpoint strategy #194
Conversation
…feat/dataset-checkpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @plaguss! Much needed feature :) it looks good, I left some comments
Co-authored-by: Gabriel Martín Blázquez <[email protected]>
Co-authored-by: Gabriel Martín Blázquez <[email protected]>
Co-authored-by: Gabriel Martín Blázquez <[email protected]>
…feat/dataset-checkpoint
…of the full dataset
Added a small reference in the docs, inside a new section of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM even though I'm still pending to test it myself!
Also a general comment is that I'd like to discuss on the naming with the rest of the team, as IMO we could just have checkpoints_path
and checkpoints_freq
or something similar, rather than asking the user to instantiate a class, while we may use/re-use the class internally, but to avoid the user for instantiating multiple classes within the same function.
Co-authored-by: Alvaro Bartolome <[email protected]>
…feat/dataset-checkpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of personal doubts and minor suggestions, but looks really nice.
…feat/dataset-checkpoint
…feat/dataset-checkpoint
Description
This PR modifies
enable_checkpoint
in thePipeline.generate
to allow a more behaviour during the generation (saving while generating).The following example allows saving the
CustomDataset
every 100 generations to disk.By default, it will use DatasetCheckpoint(save_frequency=-1), which replicates the previous behaviour of
enable_checkpoints=True
.Closes #168 and #193.