Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Batch Structure #769

Open
PaulForInvent opened this issue Feb 19, 2021 · 20 comments
Open

Question about Batch Structure #769

PaulForInvent opened this issue Feb 19, 2021 · 20 comments

Comments

@PaulForInvent
Copy link

Hi,

In you write:

This DOES NOT check if there are more labels than the batch is large or if the batch size is divisible
by the samples drawn per label.

Why did you mention this? So, this is important, having a batch containing all classes?
So I wonder if its the best to have a batch where all classes are represented or just a fraction. This can be the case for MultiRankingLoss or a BatchHardTriplet Loss. Let's say I have 100 classes. Is it better to represent all classes with 1 or 2 examples in the batch or just a /random) fraction which is the case when my batch size is smaller than the number of classes.

Btw: For the MultiRankingLoss I structure the batches such that classes are filled in the next batch if they were not used in the current batch. This way I fill the batches sequentially...

@nreimers
Copy link
Member

No, a batch must not contain all labels. If you set samples_per_label=2, then it can happen that there are more samples from the same label in a batch. It just ensures that there are at least 2.

For BatchHard-Losses, batches must contain at least two samples with the same label.

@PaulForInvent
Copy link
Author

@nreimers

If you set samples_per_label=2, then it can happen that there are more samples from the same label in a batch. I

How can this happen?

So, this SentenceLabelDtaset is very good for this kind of Batchhard Losses? Or could you mention any optimzation for this kind of losses? I wonder maybe I can try something out to improve? Have you any experience what "Batch Structure" might be more better?

And why did you mention this, if a batch must does not need to have all labels?

This DOES NOT check if there are more labels than the batch is large or if the batch size is divisible
by the samples drawn per label.

This speaks like for a optimization. So what should I do if the batch size is not divisible by the samples drawn per label?

For BatchHard-Losses, batches must contain at least two samples with the same label.

Oh. do you really mean "must not" ?;-)

@nreimers
Copy link
Member

For the BatchHard-Losses, you must ensure that there are at least two samples with the same label for every label in a batch. It is no issue if 3, 4, or 10 samples have the same label.

The mentioned DataSet class ensures this, that a batch has at least two samples for every label that happens in the batch.

In that case, your batch size should be a multiple of 2. If not, then there can be one sample with a label that only happens once. This will not be bad, but the system cannot learn anything from this sample. Because it needs a second example with the same label in the batch.

@PaulForInvent
Copy link
Author

Thank you @nreimers

Maybe you could clarify the above confusion about "must not". :-)

And, sadly I do not know anymore what I had in my mind about "optimization". Maybe you have any hints for that, what I can take care of to make a good batch for his loss besides the mentioned ones?

@PaulForInvent
Copy link
Author

PaulForInvent commented Feb 28, 2021

@nreimers So right now in my mind came the idea of using also hard negatives within the same batch of each positives? Don't you have the same problem with batch hard then if the batch has just random elements like with the normal triplet loss? So what about using also hard negatives within the same batch explicitly?

@nreimers
Copy link
Member

nreimers commented Mar 1, 2021

BatchHard automatically computes the hardest triplet in a batch.

So it is advisable to use large batches (typically as large as possible). It can also make sense to include more than just 2 samples with the same label per batch => the selected positive will then get harder.

@PaulForInvent
Copy link
Author

PaulForInvent commented Mar 1, 2021

Thanky @nreimers

What if I have some classes with fewer samples than the samples_per_label?

I meant if it might be also better if you bring manually some hard negative inside the batch? Then it would be more difficult to learn as it also sees the hard negatives for this specific positive sample (so, hard negatives for the positives used inside the batch). What do you think?

@PaulForInvent
Copy link
Author

So it is advisable to use large batches (typically as large as possible)

Ok, what if I have less samples and I choose a large batch of eg 512, but then I only have a few or even just 1 batch in total?

@nreimers
Copy link
Member

nreimers commented Mar 1, 2021

Samples with fewer than was is specified are ignored. If you have only 1 example in total for a label, it cannot be used.
Adding hard negatives can be helpful

@PaulForInvent
Copy link
Author

Adding hard negatives can be helpful

This you would have to do by customizing like for the MultiRanking Loss?

Thats why you said the batch should be as large as possible because hard negatives might have a chance to be in the batch?

@PaulForInvent
Copy link
Author

Samples with fewer than was is specified are ignored.

But it could be managable that you sample up to the maximum number of samples for this class in the Dataset Class?

Also what came in my mind right now is a maybe artificially case. Does the Dataset class handles the case if the batch size is larger than the total samples (across all classes)? Does it automatically fill up with the other negative classes or does it use a positive samples as a negative one accidentally?

@PaulForInvent
Copy link
Author

This you would have to do by customizing like for the MultiRanking Loss?

This was answered by @nreimers here.

But it could be managable that you sample up to the maximum number of samples for this class in the Dataset Class?

I think this could be possible to customize?

Does the Dataset class handles the case if the batch size is larger than the total samples (across all classes)? Does it automatically fill up with the other negative classes or does it use a positive samples as a negative one accidentally?

I did not looked in detail into it, but is this case covered?

@nreimers
Copy link
Member

nreimers commented Mar 4, 2021

Does the Dataset class handles the case if the batch size is larger than the total samples (across all classes)? Does it automatically fill up with the other negative classes or does it use a positive samples as a negative one accidentally?
I did not looked in detail into it, but is this case covered?

This would be a really strange dataset you train own, if the training data is smaller than the batch size.

The chase is covered (in principle), but it would be better to set you batch size to be equal with the size of your training dataset. Repeating the same examples in a batch would not help.

@PaulForInvent
Copy link
Author

@nreimers Oh, I meant it differently. I thought the case of when the batch size is larger than the number of distinct class groups times samples_per_label. So, I have 10 classes which means at maximum 20 samples per batch (for samples_per_label=2). But what if I a batch size of 64 or 128? How is the batch filled up? Does I have accentidally more other samples of the same class which means an artificial increase of samples_per_label?

@nreimers
Copy link
Member

nreimers commented Mar 5, 2021

The samples per batch is just the minimum. There can by any multiple of this in a batch. This is not an issue

@PaulForInvent
Copy link
Author

The samples per batch is just the minimum.

So, in my above case for each label samples_per_batch samples are drawn. But is for each label first just 1 sample drawn and then the 2nd, 3thd etc., or for each label the samples all at once?

Or maybe you could sketch how the batch is filled in the above case if the batch size is larger.

@nreimers
Copy link
Member

nreimers commented Mar 5, 2021

The dataset returns a stream always with two samples with the same label, eg
A A H H C C D D A A C C D D A A...

This is then chunked into the mini batches

@PaulForInvent
Copy link
Author

Thanks! Ok, so you definitely get most of time more than samples_per_batch in one batch. Surely one can figure out a formula depending on batch size and number of classes.

My above issue results from the intuition that the samples_per_batch is fixed upper limit. Thats why I wondered how the batches are then filled...

@PaulForInvent
Copy link
Author

@nreimers Are the samples of one label are somehow handled differently if they appear right "next to each other" in A A H H C C D D A A. Here you would have 4 A's. Would this be the same if it where A A A A H H C C D D coming from samples_per_batch =4 ?

I know, these are very interesting questions. ;)

@nreimers
Copy link
Member

nreimers commented Mar 5, 2021

The order in a batch does not make a difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants