You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was looking at the implementation of MPerClassSampler, and I noticed the following issue: in consecutive batches, there are often overlaps of classes used. For example, the first batch with batch_size=16, and m=4, might consist of classes: [1,5,3,7], while the second one might be [1,9,8,2]. This would mean that examples from class 1 could be seen more often than other examples with small datasets.
I think this can be easily overcome by generating ((length_before_new_iter // batch_size) * m) // num_unique_labels + 1 arrays of unique labels, shuffling each of them and then concatenating them. This way the sampler can take labels from i*m to (i+1)*m and be certain that after the epoch, examples from a certain class have been seen either (length_before_new_iter // batch_size) * batch_size // num_unique_labels or ((length_before_new_iter // batch_size) * batch_size // num_unique_labels) + 1 times, minimizing the initial issue.
I'm pretty certain the difference in performance would be minimal, if any. Does this make sense?
The text was updated successfully, but these errors were encountered:
I don't understand the algorithm you're proposing. But I agree that if you can make the label selection more uniform across iterations, then that would be an improvement.
Hi,
I was looking at the implementation of MPerClassSampler, and I noticed the following issue: in consecutive batches, there are often overlaps of classes used. For example, the first batch with
batch_size=16
, andm=4
, might consist of classes:[1,5,3,7]
, while the second one might be[1,9,8,2]
. This would mean that examples from class1
could be seen more often than other examples with small datasets.I think this can be easily overcome by generating
((length_before_new_iter // batch_size) * m) // num_unique_labels + 1
arrays of unique labels, shuffling each of them and then concatenating them. This way the sampler can take labels fromi*m
to(i+1)*m
and be certain that after the epoch, examples from a certain class have been seen either(length_before_new_iter // batch_size) * batch_size // num_unique_labels
or((length_before_new_iter // batch_size) * batch_size // num_unique_labels) + 1
times, minimizing the initial issue.I'm pretty certain the difference in performance would be minimal, if any. Does this make sense?
The text was updated successfully, but these errors were encountered: