fix: random sampling in ForgetRetainDataset #145

ZeguanXiao · 2025-09-27T04:59:48Z

What does this PR do?

Fixes #139

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Have you gone through the contributions guide?
Are your changes documented? Read documentation guidelines here.

molereddy · 2025-10-01T01:21:16Z

src/data/unlearn.py

+        g = torch.Generator()
+        rank = torch.distributed.get_rank() if torch.distributed.is_initialized() else 0
+        seed = int(torch.empty((), dtype=torch.int64).random_().item() + rank)
+        g.manual_seed(seed)


it would be better to use the seed from the experiment config here, rather than
int(torch.empty((), dtype=torch.int64).random_().item() to avoid introducing randomness uncontrolled by the seed.

can you try to see if you can make the experiment's cfg.seed available to this dataset class and then use seed = exp_seed + rank here?

molereddy

Thank you for the PR! Please see comment

ZeguanXiao · 2025-10-01T05:09:55Z

Thanks for the feedback! I've updated the PR accordingly. Please let me know if there are any further adjustments required.

molereddy · 2025-10-05T07:24:40Z

Please fix the lint errors!

molereddy

It is not ideal to set the seed at the exact example level. This would mean we select the same retain example index sequences even if we are using a different dataset.

Since the point is that each rank must get a different seed, imo it is better to get the rank in the global seed function: https://github.com/locuslab/open-unlearning/blob/main/src/trainer/utils.py#L8

Let me know if you see any issues.

cc @Dornavineeth

ZeguanXiao · 2025-10-18T15:24:19Z

@molereddy Simply modifying seed_everything() doesn’t work, because the process group is only initialized after the Accelerator is created — that is, after the trainer is initialized (in src/train.py). Calling seed_everything() again after the trainer has been initialized can achieve the desired effect. Do you think this implementation is acceptable, or do you have a better suggestion?

…n seed_everything()

ZeguanXiao · 2025-10-19T13:12:07Z

@molereddy Currently, my implementation adds a torch.Generator instance to DataCollatorForSupervisedDataset.
After the process group is initialized (i.e., after the trainer is initialized in src/train.py),
I set the seed of the torch.Generator instance for each rank’s DataCollatorForSupervisedDataset object.

Could you please check if this approach is feasible/correct?

fix: random sampling in ForgetRetainDataset

faad8d9

molereddy reviewed Oct 1, 2025

View reviewed changes

molereddy requested changes Oct 1, 2025

View reviewed changes

feat: add seed parameter for reproducibility in ForgetRetainDataset

c079574

ZeguanXiao had a problem deploying to tests October 2, 2025 20:26 — with GitHub Actions Failure

ZeguanXiao added 2 commits October 9, 2025 00:10

refactor: fix lint

60e099a

fix: ensure unique random seed per item in ForgetRetainDataset

7a8b5fd

molereddy requested changes Oct 9, 2025

View reviewed changes

ZeguanXiao added 2 commits October 19, 2025 10:42

fix: remove seed arg from data pipeline and make rank-aware seeding i…

19a6141

…n seed_everything()

fix: use rank-specific seeding for ForgetRetainDataset

80ef3c7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: random sampling in ForgetRetainDataset #145

fix: random sampling in ForgetRetainDataset #145

Uh oh!

ZeguanXiao commented Sep 27, 2025

Uh oh!

molereddy Oct 1, 2025

Uh oh!

molereddy left a comment

Uh oh!

ZeguanXiao commented Oct 1, 2025

Uh oh!

molereddy commented Oct 5, 2025

Uh oh!

molereddy left a comment

Uh oh!

ZeguanXiao commented Oct 18, 2025

Uh oh!

ZeguanXiao commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: random sampling in ForgetRetainDataset #145

Are you sure you want to change the base?

fix: random sampling in ForgetRetainDataset #145

Uh oh!

Conversation

ZeguanXiao commented Sep 27, 2025

What does this PR do?

Before submitting

Uh oh!

molereddy Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

molereddy left a comment

Choose a reason for hiding this comment

Uh oh!

ZeguanXiao commented Oct 1, 2025

Uh oh!

molereddy commented Oct 5, 2025

Uh oh!

molereddy left a comment

Choose a reason for hiding this comment

Uh oh!

ZeguanXiao commented Oct 18, 2025

Uh oh!

ZeguanXiao commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants