Question about Cross-Validation for a downstream task #984

PaulForInvent · 2021-06-03T19:05:19Z

Hey,

do you think, I should use cross-validation of my trainingdata while fine-tune a model for semantic search (and simalirity task)?

Surprisingly I always ignored this...

nreimers · 2021-06-04T08:19:32Z

If you perform an ablation on e.g. what is the best model, what is the best loss, what are the best parameters, then using CV can make sense if it is computationally feasible.

PaulForInvent · 2021-06-04T09:44:47Z

@nreimers Thanks.

Just try to do it, but I saw that for kfold you need of course the RandomSubsetSamplers. In my case I use the SentencesLabelDataset which is a IterableDataset and cannot be used with a sampler. That is bad.

Is it possible to have the SentencesLabelDataset as normal Dataset?

nreimers · 2021-06-04T11:53:46Z

It would be better to first create the fold, and the re-init your SentencesLabelDataset

PaulForInvent · 2021-06-04T12:06:35Z

It would be better to first create the fold, and the re-init your SentencesLabelDataset

So you suppose to create the fold without any pytorch dataset? But isn't it possible to chnage the SentencesLabelDataset to a normal dataset by replacing yield by return eg...?

nreimers · 2021-06-04T12:21:18Z

I think it is easier to first create your different folds, and then create a new SentencesLabelDataset from it.

PaulForInvent · 2021-06-04T12:26:29Z

I think it is easier to first create your different folds

For this I like to do it with a dataset and a SubsetSampler to sample the folds in a pytorch way? Or how would you create the fold?

PhilipMay · 2021-06-04T12:33:47Z

Maybe you want to have a look here: https://github.com/German-NLP-Group/xlsr

In this script: https://github.com/German-NLP-Group/xlsr/blob/main/xlsr/train_optuna_stsb.py

There I use cross validation as I think it is useful.

PhilipMay · 2021-06-04T13:02:53Z

I prefer to use crossvalidation when I do automated hyperparameter search. The reasons are:

cross validation reduces overfitting on the validation set when you do automated hyperparameter search
through using multiple val. sets they better cover your data space when working with small datasets
because neural networks are random initialized the random effects on the results are reduced when you calculate the mean of the folds

PaulForInvent · 2021-06-04T13:07:18Z

@PhilipMay Thanks, Maybe using simple arrays is better. I wanted to do it like here:
https://www.machinecurve.com/index.php/2021/02/03/how-to-use-k-fold-cross-validation-with-pytorch/

But I think the SentencesLabelDataset can be rewritten to a simple dataset.

I saw you are using optimizer parameters for tuning like weight decay. Did you find any improvement by that? I found that tuning learningrate is not very usefull (at least in my case).

PhilipMay · 2021-06-04T13:21:54Z

@PaulForInvent

Here is the Optuna Importance Plot

PhilipMay · 2021-06-04T13:22:37Z

@PaulForInvent

and the slice plot

PaulForInvent · 2021-06-04T13:53:32Z

I asked this myself too.

#791

PaulForInvent · 2021-06-04T16:56:50Z

Now I have a different issue. Since I use mainly Batchhard-Losses, I have examples with its class labels. Up to now for evaluation I used a ranking metric on a different validation set. Now, I wonder how do I evaluate my model on each fold, as now, both data are similar structured (meaning these are just labeled examples). Now I could use a evaluation metric if the class label is predicted correctly (multi class task) or a triplet Evaluator...

My main task is actually Ranking, so I also would like to do a ranking evaluation for each fold...but since my fold is fixed I cannot do a ranking task and just have to use the available samples of each class (possibly with a ParaphraseMiningEvaluator())?

Oh this come in my mind right now: someone has used a combination of aranking metric like MRR and a binaray metric like precision to combine for evaluation (and using for parameter tuning)? @PhilipMay @nreimers

PaulForInvent · 2021-06-07T14:50:35Z

@nreimers :

I wonder if your ParaphraseMiningEvaluator or BinaryClassificationEvaluator handles the case for ignoring self refeernces by calculating the cosine score of a list of sentences with itself?

nreimers · 2021-06-07T15:10:25Z

It computes whatever you pass as your data. The ParaphraseMiningEvaluator ignores self references.

PaulForInvent · 2021-06-08T10:11:01Z

@PhilipMay I just saw that your are drawing the parameters in each fold new. I did this same thing too. But shouldn't be the parameters for all folds the same?

I also try to find out, how to build the final model, after I used CV for finding the best parameters? This seems a very heavy discussed topic...

Should I then retrain the model using all trainingdata ? Also, despite setting a seed, you cannot guarantee that each model trained with the same parameters yield the same results... So you should save each model during CV and then continue finetuning on all the data?
Any standard way you experiences to be good? @nreimers

PhilipMay · 2021-06-08T11:53:33Z

I just saw that your are drawing the parameters in each fold new.

No. It just seems like that. When you draw them from optuna multiple times the 2nd and alls following times it returns the same value until the trial is over.

I did this same thing too. But shouldn't be the parameters for all folds the same?

It should (must) be all the same.

PhilipMay · 2021-06-08T11:57:59Z

Should I then retrain the model using all trainingdata ? Also, despite setting a seed, you cannot guarantee that each model trained with the same parameters yield the same results... So you should save each model during CV and then continue finetuning on all the data?
Any standard way you experiences to be good? @nreimers

I hate seeds and do not use them when doing HP optimization with CV. I just do many CV steps and average them. CV is only about Hyperparameter finding and not about model creation.

When I want to create the "best" final model I train the model with best HP set on the full dataset at the end.

PaulForInvent · 2021-06-08T12:25:50Z

I hate seeds and do not use them when doing HP optimization with CV. I just do many CV steps and average them. CV is only about Hyperparameter finding and not about model creation.

When I want to create the "best" final model I train the model with best HP set on the full dataset at the end.

Yes. That is straight forward if always the model behaves the same every training run with the same hyperparameters. I found that some loss types, the results vary (also strongly) training same model each time. But I feel I am the only having this problem... That's why I set sometimes seeds and save each model for each parameter set. But then I can only use this trained model to train it on all the data... This is maybe different from training from scratch with the found best parameters.

PaulForInvent · 2021-06-09T15:49:05Z

@PhilipMay What are your experience with randomness? If I do HP search and try to retrain the model with any parameters I get always different results. So, just finding the HPs is not meaningful as not reproducible...

PhilipMay · 2021-06-10T13:47:07Z

@PaulForInvent just because there is randomness and you get different results does not mean it is not useful.

For example:
I have usecases with small data sets (6000) where I do 10 fold x-validation. The result is the mean of the folds. That helps
to reduce the effect of randomness.

PhilipMay · 2022-11-03T10:31:26Z

By the way. I saw that the stsb dataset has duplicate sentences in the train set.
So doing cross validation might be no good idea since you might have information leakage from
train to validation...

@PaulForInvent

PaulForInvent changed the title ~~Question about Coross-Validation for a downstream task~~ Question about Cross-Validation for a downstream task Jun 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Cross-Validation for a downstream task #984

Question about Cross-Validation for a downstream task #984

PaulForInvent commented Jun 3, 2021 •

edited

Loading

nreimers commented Jun 4, 2021

PaulForInvent commented Jun 4, 2021

nreimers commented Jun 4, 2021

PaulForInvent commented Jun 4, 2021

nreimers commented Jun 4, 2021

PaulForInvent commented Jun 4, 2021

PhilipMay commented Jun 4, 2021 •

edited

Loading

PhilipMay commented Jun 4, 2021 •

edited

Loading

PaulForInvent commented Jun 4, 2021 •

edited

Loading

PhilipMay commented Jun 4, 2021

PhilipMay commented Jun 4, 2021

PaulForInvent commented Jun 4, 2021

PaulForInvent commented Jun 4, 2021 •

edited

Loading

PaulForInvent commented Jun 7, 2021

nreimers commented Jun 7, 2021

PaulForInvent commented Jun 8, 2021 •

edited

Loading

PhilipMay commented Jun 8, 2021 •

edited

Loading

PhilipMay commented Jun 8, 2021 •

edited

Loading

PaulForInvent commented Jun 8, 2021

PaulForInvent commented Jun 9, 2021

PhilipMay commented Jun 10, 2021

PhilipMay commented Nov 3, 2022

Question about Cross-Validation for a downstream task #984

Question about Cross-Validation for a downstream task #984

Comments

PaulForInvent commented Jun 3, 2021 • edited Loading

nreimers commented Jun 4, 2021

PaulForInvent commented Jun 4, 2021

nreimers commented Jun 4, 2021

PaulForInvent commented Jun 4, 2021

nreimers commented Jun 4, 2021

PaulForInvent commented Jun 4, 2021

PhilipMay commented Jun 4, 2021 • edited Loading

PhilipMay commented Jun 4, 2021 • edited Loading

PaulForInvent commented Jun 4, 2021 • edited Loading

PhilipMay commented Jun 4, 2021

PhilipMay commented Jun 4, 2021

PaulForInvent commented Jun 4, 2021

PaulForInvent commented Jun 4, 2021 • edited Loading

PaulForInvent commented Jun 7, 2021

nreimers commented Jun 7, 2021

PaulForInvent commented Jun 8, 2021 • edited Loading

PhilipMay commented Jun 8, 2021 • edited Loading

PhilipMay commented Jun 8, 2021 • edited Loading

PaulForInvent commented Jun 8, 2021

PaulForInvent commented Jun 9, 2021

PhilipMay commented Jun 10, 2021

PhilipMay commented Nov 3, 2022

PaulForInvent commented Jun 3, 2021 •

edited

Loading

PhilipMay commented Jun 4, 2021 •

edited

Loading

PhilipMay commented Jun 4, 2021 •

edited

Loading

PaulForInvent commented Jun 4, 2021 •

edited

Loading

PaulForInvent commented Jun 4, 2021 •

edited

Loading

PaulForInvent commented Jun 8, 2021 •

edited

Loading

PhilipMay commented Jun 8, 2021 •

edited

Loading

PhilipMay commented Jun 8, 2021 •

edited

Loading