Labeled and Unlabeled dataloaders #69

nysp78 · 2022-10-08T15:59:05Z

Hello,
I want to ask how you handle the dataloading when the unlabeled data is more than the labeled. I have read a couple of approaches for this. The first is, as you have done, to define an epoch as the passing of all unlabeled data from the network, but with this the labeled data will be passed from the network multiple times in an epoch. The second approach is to use a sampler to sample at each training step equal amount of unlabeled data to match the size of the labeled data, so to have 2 dataloaders with the same size. Which of these 2 techniques would force the model to perform better? Generally , I'm a bit confused on how to construct the dataloaders of labeled and unlabeled data in a semi supervised setting. Any hints will be appreciated!

Thanks in advance.

charlesCXK · 2022-10-18T12:52:31Z

Hi, I recommend using the first one: define an epoch as the passing of all unlabeled data from the network.

To construct the dataloaders, I use the argument config.max_samples to specify the maximum samples in an epoch.

TorchSemiSeg/exp.voc/voc8.res50v3+.CPS/dataloader.py

Line 82 in f67b373

config.max_samples, unsupervised=unsupervised)

The labeled set will be sampled repeatedly until the maximum number of samples is met.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Labeled and Unlabeled dataloaders #69

Labeled and Unlabeled dataloaders #69

nysp78 commented Oct 8, 2022 •

edited

Loading

charlesCXK commented Oct 18, 2022

Labeled and Unlabeled dataloaders #69

Labeled and Unlabeled dataloaders #69

Comments

nysp78 commented Oct 8, 2022 • edited Loading

charlesCXK commented Oct 18, 2022

nysp78 commented Oct 8, 2022 •

edited

Loading