-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about Cross-Validation for a downstream task #984
Comments
If you perform an ablation on e.g. what is the best model, what is the best loss, what are the best parameters, then using CV can make sense if it is computationally feasible. |
@nreimers Thanks. Just try to do it, but I saw that for kfold you need of course the RandomSubsetSamplers. In my case I use the SentencesLabelDataset which is a IterableDataset and cannot be used with a sampler. That is bad. Is it possible to have the SentencesLabelDataset as normal Dataset? |
It would be better to first create the fold, and the re-init your SentencesLabelDataset |
So you suppose to create the fold without any pytorch dataset? But isn't it possible to chnage the SentencesLabelDataset to a normal dataset by replacing yield by return eg...? |
I think it is easier to first create your different folds, and then create a new SentencesLabelDataset from it. |
For this I like to do it with a dataset and a SubsetSampler to sample the folds in a pytorch way? Or how would you create the fold? |
Maybe you want to have a look here: https://github.com/German-NLP-Group/xlsr In this script: https://github.com/German-NLP-Group/xlsr/blob/main/xlsr/train_optuna_stsb.py There I use cross validation as I think it is useful. |
I prefer to use crossvalidation when I do automated hyperparameter search. The reasons are:
|
@PhilipMay Thanks, Maybe using simple arrays is better. I wanted to do it like here: But I think the SentencesLabelDataset can be rewritten to a simple dataset. I saw you are using optimizer parameters for tuning like weight decay. Did you find any improvement by that? I found that tuning learningrate is not very usefull (at least in my case). |
Here is the Optuna Importance Plot |
and the slice plot |
I asked this myself too. |
Now I have a different issue. Since I use mainly Batchhard-Losses, I have examples with its class labels. Up to now for evaluation I used a ranking metric on a different validation set. Now, I wonder how do I evaluate my model on each fold, as now, both data are similar structured (meaning these are just labeled examples). Now I could use a evaluation metric if the class label is predicted correctly (multi class task) or a triplet Evaluator... My main task is actually Ranking, so I also would like to do a ranking evaluation for each fold...but since my fold is fixed I cannot do a ranking task and just have to use the available samples of each class (possibly with a ParaphraseMiningEvaluator())? Oh this come in my mind right now: someone has used a combination of aranking metric like MRR and a binaray metric like precision to combine for evaluation (and using for parameter tuning)? @PhilipMay @nreimers |
I wonder if your ParaphraseMiningEvaluator or BinaryClassificationEvaluator handles the case for ignoring self refeernces by calculating the cosine score of a list of sentences with itself? |
It computes whatever you pass as your data. The ParaphraseMiningEvaluator ignores self references. |
@PhilipMay I just saw that your are drawing the parameters in each fold new. I did this same thing too. But shouldn't be the parameters for all folds the same? I also try to find out, how to build the final model, after I used CV for finding the best parameters? This seems a very heavy discussed topic... Should I then retrain the model using all trainingdata ? Also, despite setting a seed, you cannot guarantee that each model trained with the same parameters yield the same results... So you should save each model during CV and then continue finetuning on all the data? |
No. It just seems like that. When you draw them from optuna multiple times the 2nd and alls following times it returns the same value until the trial is over.
It should (must) be all the same. |
I hate seeds and do not use them when doing HP optimization with CV. I just do many CV steps and average them. CV is only about Hyperparameter finding and not about model creation. When I want to create the "best" final model I train the model with best HP set on the full dataset at the end. |
Yes. That is straight forward if always the model behaves the same every training run with the same hyperparameters. I found that some loss types, the results vary (also strongly) training same model each time. But I feel I am the only having this problem... That's why I set sometimes seeds and save each model for each parameter set. But then I can only use this trained model to train it on all the data... This is maybe different from training from scratch with the found best parameters. |
@PhilipMay What are your experience with randomness? If I do HP search and try to retrain the model with any parameters I get always different results. So, just finding the HPs is not meaningful as not reproducible... |
@PaulForInvent just because there is randomness and you get different results does not mean it is not useful. For example: |
By the way. I saw that the stsb dataset has duplicate sentences in the train set. |
Hey,
do you think, I should use cross-validation of my trainingdata while fine-tune a model for semantic search (and simalirity task)?
Surprisingly I always ignored this...
The text was updated successfully, but these errors were encountered: