Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible "improvement" of MPerClassSampler? #725

Open
ir2718 opened this issue Oct 26, 2024 · 1 comment
Open

Possible "improvement" of MPerClassSampler? #725

ir2718 opened this issue Oct 26, 2024 · 1 comment

Comments

@ir2718
Copy link
Contributor

ir2718 commented Oct 26, 2024

Hi,

I was looking at the implementation of MPerClassSampler, and I noticed the following issue: in consecutive batches, there are often overlaps of classes used. For example, the first batch with batch_size=16, and m=4, might consist of classes: [1,5,3,7], while the second one might be [1,9,8,2]. This would mean that examples from class 1 could be seen more often than other examples with small datasets.

I think this can be easily overcome by generating ((length_before_new_iter // batch_size) * m) // num_unique_labels + 1 arrays of unique labels, shuffling each of them and then concatenating them. This way the sampler can take labels from i*m to (i+1)*m and be certain that after the epoch, examples from a certain class have been seen either (length_before_new_iter // batch_size) * batch_size // num_unique_labels or ((length_before_new_iter // batch_size) * batch_size // num_unique_labels) + 1 times, minimizing the initial issue.

I'm pretty certain the difference in performance would be minimal, if any. Does this make sense?

@KevinMusgrave
Copy link
Owner

I don't understand the algorithm you're proposing. But I agree that if you can make the label selection more uniform across iterations, then that would be an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants