Question about Batch Structure #769

PaulForInvent · 2021-02-19T15:33:38Z

Hi,

In you write:

This DOES NOT check if there are more labels than the batch is large or if the batch size is divisible
by the samples drawn per label.

Why did you mention this? So, this is important, having a batch containing all classes?
So I wonder if its the best to have a batch where all classes are represented or just a fraction. This can be the case for MultiRankingLoss or a BatchHardTriplet Loss. Let's say I have 100 classes. Is it better to represent all classes with 1 or 2 examples in the batch or just a /random) fraction which is the case when my batch size is smaller than the number of classes.

Btw: For the MultiRankingLoss I structure the batches such that classes are filled in the next batch if they were not used in the current batch. This way I fill the batches sequentially...

nreimers · 2021-02-19T19:18:48Z

No, a batch must not contain all labels. If you set samples_per_label=2, then it can happen that there are more samples from the same label in a batch. It just ensures that there are at least 2.

For BatchHard-Losses, batches must contain at least two samples with the same label.

PaulForInvent · 2021-02-26T15:29:39Z

@nreimers

If you set samples_per_label=2, then it can happen that there are more samples from the same label in a batch. I

How can this happen?

So, this SentenceLabelDtaset is very good for this kind of Batchhard Losses? Or could you mention any optimzation for this kind of losses? I wonder maybe I can try something out to improve? Have you any experience what "Batch Structure" might be more better?

And why did you mention this, if a batch must does not need to have all labels?

This DOES NOT check if there are more labels than the batch is large or if the batch size is divisible
by the samples drawn per label.

This speaks like for a optimization. So what should I do if the batch size is not divisible by the samples drawn per label?

For BatchHard-Losses, batches must contain at least two samples with the same label.

Oh. do you really mean "must not" ?;-)

nreimers · 2021-02-26T16:57:53Z

For the BatchHard-Losses, you must ensure that there are at least two samples with the same label for every label in a batch. It is no issue if 3, 4, or 10 samples have the same label.

The mentioned DataSet class ensures this, that a batch has at least two samples for every label that happens in the batch.

In that case, your batch size should be a multiple of 2. If not, then there can be one sample with a label that only happens once. This will not be bad, but the system cannot learn anything from this sample. Because it needs a second example with the same label in the batch.

PaulForInvent · 2021-02-27T13:49:15Z

Thank you @nreimers

Maybe you could clarify the above confusion about "must not". :-)

And, sadly I do not know anymore what I had in my mind about "optimization". Maybe you have any hints for that, what I can take care of to make a good batch for his loss besides the mentioned ones?

PaulForInvent · 2021-02-28T11:14:45Z

@nreimers So right now in my mind came the idea of using also hard negatives within the same batch of each positives? Don't you have the same problem with batch hard then if the batch has just random elements like with the normal triplet loss? So what about using also hard negatives within the same batch explicitly?

nreimers · 2021-03-01T07:31:27Z

BatchHard automatically computes the hardest triplet in a batch.

So it is advisable to use large batches (typically as large as possible). It can also make sense to include more than just 2 samples with the same label per batch => the selected positive will then get harder.

PaulForInvent · 2021-03-01T18:12:04Z

Thanky @nreimers

What if I have some classes with fewer samples than the samples_per_label?

I meant if it might be also better if you bring manually some hard negative inside the batch? Then it would be more difficult to learn as it also sees the hard negatives for this specific positive sample (so, hard negatives for the positives used inside the batch). What do you think?

PaulForInvent · 2021-03-01T18:16:10Z

So it is advisable to use large batches (typically as large as possible)

Ok, what if I have less samples and I choose a large batch of eg 512, but then I only have a few or even just 1 batch in total?

nreimers · 2021-03-01T18:23:49Z

Samples with fewer than was is specified are ignored. If you have only 1 example in total for a label, it cannot be used.
Adding hard negatives can be helpful

PaulForInvent · 2021-03-01T18:25:55Z

Adding hard negatives can be helpful

This you would have to do by customizing like for the MultiRanking Loss?

Thats why you said the batch should be as large as possible because hard negatives might have a chance to be in the batch?

PaulForInvent · 2021-03-01T18:32:19Z

Samples with fewer than was is specified are ignored.

But it could be managable that you sample up to the maximum number of samples for this class in the Dataset Class?

Also what came in my mind right now is a maybe artificially case. Does the Dataset class handles the case if the batch size is larger than the total samples (across all classes)? Does it automatically fill up with the other negative classes or does it use a positive samples as a negative one accidentally?

PaulForInvent · 2021-03-04T15:18:09Z

This you would have to do by customizing like for the MultiRanking Loss?

This was answered by @nreimers here.

But it could be managable that you sample up to the maximum number of samples for this class in the Dataset Class?

I think this could be possible to customize?

Does the Dataset class handles the case if the batch size is larger than the total samples (across all classes)? Does it automatically fill up with the other negative classes or does it use a positive samples as a negative one accidentally?

I did not looked in detail into it, but is this case covered?

nreimers · 2021-03-04T19:14:32Z

Does the Dataset class handles the case if the batch size is larger than the total samples (across all classes)? Does it automatically fill up with the other negative classes or does it use a positive samples as a negative one accidentally?
I did not looked in detail into it, but is this case covered?

This would be a really strange dataset you train own, if the training data is smaller than the batch size.

The chase is covered (in principle), but it would be better to set you batch size to be equal with the size of your training dataset. Repeating the same examples in a batch would not help.

PaulForInvent · 2021-03-05T15:48:06Z

@nreimers Oh, I meant it differently. I thought the case of when the batch size is larger than the number of distinct class groups times samples_per_label. So, I have 10 classes which means at maximum 20 samples per batch (for samples_per_label=2). But what if I a batch size of 64 or 128? How is the batch filled up? Does I have accentidally more other samples of the same class which means an artificial increase of samples_per_label?

nreimers · 2021-03-05T16:00:49Z

The samples per batch is just the minimum. There can by any multiple of this in a batch. This is not an issue

PaulForInvent · 2021-03-05T16:12:30Z

The samples per batch is just the minimum.

So, in my above case for each label samples_per_batch samples are drawn. But is for each label first just 1 sample drawn and then the 2nd, 3thd etc., or for each label the samples all at once?

Or maybe you could sketch how the batch is filled in the above case if the batch size is larger.

nreimers · 2021-03-05T16:21:04Z

The dataset returns a stream always with two samples with the same label, eg
A A H H C C D D A A C C D D A A...

This is then chunked into the mini batches

PaulForInvent · 2021-03-05T16:37:12Z

Thanks! Ok, so you definitely get most of time more than samples_per_batch in one batch. Surely one can figure out a formula depending on batch size and number of classes.

My above issue results from the intuition that the samples_per_batch is fixed upper limit. Thats why I wondered how the batches are then filled...

PaulForInvent · 2021-03-05T16:46:48Z

@nreimers Are the samples of one label are somehow handled differently if they appear right "next to each other" in A A H H C C D D A A. Here you would have 4 A's. Would this be the same if it where A A A A H H C C D D coming from samples_per_batch =4 ?

I know, these are very interesting questions. ;)

nreimers · 2021-03-05T16:59:03Z

The order in a batch does not make a difference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Batch Structure #769

Question about Batch Structure #769

PaulForInvent commented Feb 19, 2021

nreimers commented Feb 19, 2021

PaulForInvent commented Feb 26, 2021

nreimers commented Feb 26, 2021

PaulForInvent commented Feb 27, 2021

PaulForInvent commented Feb 28, 2021 •

edited

Loading

nreimers commented Mar 1, 2021

PaulForInvent commented Mar 1, 2021 •

edited

Loading

PaulForInvent commented Mar 1, 2021

nreimers commented Mar 1, 2021

PaulForInvent commented Mar 1, 2021

PaulForInvent commented Mar 1, 2021

PaulForInvent commented Mar 4, 2021

nreimers commented Mar 4, 2021 •

edited

Loading

PaulForInvent commented Mar 5, 2021

nreimers commented Mar 5, 2021

PaulForInvent commented Mar 5, 2021

nreimers commented Mar 5, 2021

PaulForInvent commented Mar 5, 2021

PaulForInvent commented Mar 5, 2021

nreimers commented Mar 5, 2021

Question about Batch Structure #769

Question about Batch Structure #769

Comments

PaulForInvent commented Feb 19, 2021

nreimers commented Feb 19, 2021

PaulForInvent commented Feb 26, 2021

nreimers commented Feb 26, 2021

PaulForInvent commented Feb 27, 2021

PaulForInvent commented Feb 28, 2021 • edited Loading

nreimers commented Mar 1, 2021

PaulForInvent commented Mar 1, 2021 • edited Loading

PaulForInvent commented Mar 1, 2021

nreimers commented Mar 1, 2021

PaulForInvent commented Mar 1, 2021

PaulForInvent commented Mar 1, 2021

PaulForInvent commented Mar 4, 2021

nreimers commented Mar 4, 2021 • edited Loading

PaulForInvent commented Mar 5, 2021

nreimers commented Mar 5, 2021

PaulForInvent commented Mar 5, 2021

nreimers commented Mar 5, 2021

PaulForInvent commented Mar 5, 2021

PaulForInvent commented Mar 5, 2021

nreimers commented Mar 5, 2021

PaulForInvent commented Feb 28, 2021 •

edited

Loading

PaulForInvent commented Mar 1, 2021 •

edited

Loading

nreimers commented Mar 4, 2021 •

edited

Loading