Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

results of the best checkpoint are different between training and evaluation #15

Open
linhaojia13 opened this issue May 8, 2023 · 1 comment

Comments

@linhaojia13
Copy link

At the end of the training log, the results of the best chechpoint:

training completed...

--------------------------------------best--------------------------------------
[best] epoch: 25
[loss] loss: 44.52341
[loss] ref_loss: 16.75534
[loss] ref_mask_loss: 0.0
[loss] lang_cls_loss: 0.22115
[loss] objectness_loss: 0.33091
[loss] kps_loss: 0.0285
[loss] box_loss: 2.68898
[loss] sem_cls_loss: 5.56197
[loss] lang_cls_acc: 0.93388
[sco.] ref_acc: 0.14872
[sco.] obj_acc: 0.76845
[sco.] pos_ratio: 0.68719, neg_ratio: 0.31281
[sco.] iou_rate_0.25: 0.47397, iou_rate_0.5: 0.36692

saving checkpoint...

saving last models...

After the training, I run the command for evaluation: CUDA_VISIBLE_DEVICES=0 python scripts/eval.py --config ./config/sps.yaml --folder 2023-05-07_00-36_SPS/ --reference --no_nms --force :

unique:
unique | not_in_others | ref_acc: 0.14891243725599554
unique | not_in_others | [email protected]: 0.8120468488566648
unique | not_in_others | [email protected]: 0.6447295036252092
unique | in_others | ref_acc: 0.09615384615384616
unique | in_others | [email protected]: 0.7692307692307693
unique | in_others | [email protected]: 0.5961538461538461
unique | overall | ref_acc: 0.14742547425474256
unique | overall | [email protected]: 0.810840108401084
unique | overall | [email protected]: 0.643360433604336

multiple:
multiple | not_in_others | ref_acc: 0.07918758557736194
multiple | not_in_others | [email protected]: 0.3247375627567321
multiple | not_in_others | [email protected]: 0.26449109995435877
multiple | in_others | ref_acc: 0.2307223407497714
multiple | in_others | [email protected]: 0.4687595245352027
multiple | in_others | [email protected]: 0.32855836635172203
multiple | overall | ref_acc: 0.14406890251859586
multiple | overall | [email protected]: 0.38640219235286444
multiple | overall | [email protected]: 0.29192222367219106

overall:
overall | not_in_others | ref_acc: 0.0994331983805668
overall | not_in_others | [email protected]: 0.4662348178137652
overall | not_in_others | [email protected]: 0.3748987854251012
overall | in_others | ref_acc: 0.22862286228622863
overall | in_others | [email protected]: 0.4734473447344735
overall | in_others | [email protected]: 0.3327332733273327
overall | overall | ref_acc: 0.1447202355910812
overall | overall | [email protected]: 0.4687631468237274
overall | overall | [email protected]: 0.3601177955405974

language classification accuracy: 0.9309404022447408

The best overall accuracy during training is The overall acc is iou_rate_0.25: 0.47397, iou_rate_0.5: 0.36692, but in the evaluation the best one is overall | overall | [email protected]: 0.4687631468237274 overall | overall | [email protected]: 0.3601177955405974.
Why is there such a discrepancy? Did I make a mistake somewhere?

@xuxiaoxxxx
Copy link

I meet the same question with you. Do you slove it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants