uniformly use skip for both (map-style) Dataset and IterableDataset #521

tianyu-l · 2024-08-15T05:36:05Z

Stack from ghstack (oldest at bottom):

-> uniformly use skip for both (map-style) Dataset and IterableDataset #521

The support of skip on "an IterableDataset obtained from split_dataset_by_node" has landed in huggingface/datasets#6965 and released in https://github.com/huggingface/datasets/releases/tag/2.21.0

For previous discussions see
https://discuss.huggingface.co/t/skip-not-implemented-for-iterabledataset-after-split-dataset-by-node/91450

I manually did a unit-test on the c4 dataset (as an IterableDataset) by replacing https://github.com/pytorch/torchtitan/blob/main/test/datasets/test_checkpoint.py#L14

[ghstack-poisoned]

ghstack-source-id: d82c233f21dddc74794d9c492a781dffc52eb5de Pull Request resolved: #521

…leDataset" [ghstack-poisoned]

ghstack-source-id: c8f611742ffbb4859988b97e706b9e0d1b4ad6f1 Pull Request resolved: #521

gokulavasan

Looks great!

ghstack-source-id: c8f611742ffbb4859988b97e706b9e0d1b4ad6f1 Pull Request resolved: #521

uniformly use skip for both (map-style) Dataset and IterableDataset

f5c2b78

[ghstack-poisoned]

tianyu-l added a commit that referenced this pull request Aug 15, 2024

uniformly use skip for both (map-style) Dataset and IterableDataset

89b72c0

ghstack-source-id: d82c233f21dddc74794d9c492a781dffc52eb5de Pull Request resolved: #521

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 15, 2024

Update on "uniformly use skip for both (map-style) Dataset and Iterab…

ab895dc

…leDataset" [ghstack-poisoned]

tianyu-l added a commit that referenced this pull request Aug 15, 2024

uniformly use skip for both (map-style) Dataset and IterableDataset

c389777

ghstack-source-id: c8f611742ffbb4859988b97e706b9e0d1b4ad6f1 Pull Request resolved: #521

tianyu-l requested a review from gokulavasan August 15, 2024 05:47

gokulavasan approved these changes Aug 16, 2024

View reviewed changes

tianyu-l merged commit ab895dc into gh/tianyu-l/20/base Aug 16, 2024
6 checks passed

tianyu-l added a commit that referenced this pull request Aug 16, 2024

uniformly use skip for both (map-style) Dataset and IterableDataset

feac534

ghstack-source-id: c8f611742ffbb4859988b97e706b9e0d1b4ad6f1 Pull Request resolved: #521

tianyu-l deleted the gh/tianyu-l/20/head branch August 16, 2024 19:55

tianyu-l added a commit that referenced this pull request Aug 16, 2024

uniformly use skip for both (map-style) Dataset and IterableDataset

81c555f

ghstack-source-id: c8f611742ffbb4859988b97e706b9e0d1b4ad6f1 Pull Request resolved: #521

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uniformly use skip for both (map-style) Dataset and IterableDataset #521

uniformly use skip for both (map-style) Dataset and IterableDataset #521

tianyu-l commented Aug 15, 2024 •

edited

Loading

gokulavasan left a comment

uniformly use skip for both (map-style) Dataset and IterableDataset #521

uniformly use skip for both (map-style) Dataset and IterableDataset #521

Conversation

tianyu-l commented Aug 15, 2024 • edited Loading

gokulavasan left a comment

Choose a reason for hiding this comment

tianyu-l commented Aug 15, 2024 •

edited

Loading