Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't reshuffle eval data each "epoch" #229

Merged
merged 4 commits into from
Jul 11, 2023
Merged

Don't reshuffle eval data each "epoch" #229

merged 4 commits into from
Jul 11, 2023

Conversation

epwalsh
Copy link
Member

@epwalsh epwalsh commented Jul 10, 2023

Keep eval data order consistent across iterations. Add paths to metadata in MemMapDataset class.

@epwalsh epwalsh requested a review from dirkgr July 10, 2023 23:00
olmo/util.py Outdated
Comment on lines 341 to 345
if isinstance(dataloader.sampler, DistributedSampler):
if update_epoch_seed and isinstance(dataloader.sampler, DistributedSampler):
epoch = dataloader.sampler.epoch + 1
dataloader.sampler.set_epoch(epoch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how skipping this makes sure it starts from scratch every epoch. I thought we'd have to reset the data loader somehow every time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each validation loop through a dataset is a complete loop / epoch over the dataset (less the extra examples, since drop_last=True). The DistributedSampler reshuffles each epoch with a specific seed that's shared across all processes. That seed never changes unless you call DistributedSampler.set_epoch(). See https://pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the fact that this is confusing, err kind of works by exploiting a nuance of the DistributedSampler, tells me that this probably isn't the best way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated: 9b52525

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need cycle_through_epoch then? Or at least, do we still need this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed: ac8322c

@epwalsh epwalsh requested a review from dirkgr July 11, 2023 16:50
Copy link
Member

@dirkgr dirkgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming then that we don't use cycle_through_epochs for the training set?

@epwalsh
Copy link
Member Author

epwalsh commented Jul 11, 2023

I'm assuming then that we don't use cycle_through_epochs for the training set?

Right

@epwalsh epwalsh merged commit 952819b into main Jul 11, 2023
10 checks passed
@epwalsh epwalsh deleted the eval-order branch July 11, 2023 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants