The implementation of the paper "Rethinking Feature Distribution for Loss Functions in Image Classification" The implementation of the paper "Rethinking Feature Distribution for Loss Functions in Image Classification"
- (Rethinking Feature Distribution for Loss Functions in Image Classification)[]
- (caffe&tensorflow gmloss)[]
- (pytorch gmloss)[]
- (gluon gmloss)[]
The first edition is improved from the LeeJuly30 , I use the MSRA initialization method to replace the Xavier and add the margin parameter increase schedule according to caffe edition,which is helpful to improve the acc performance. I am still confused about the parameter margin_add in caffe .
The latest edition v3 is based on the caffe and gluon edition . I use softmaxloss to replace the Gluon edition's gmloss,gmloss layer is only used to provide the logits_with_margin like caffe code. . This margin is seemed as the logits in the softmaxloss. And I also rewrite the likelihood_reg_loss part in the gluon edition using caffe's method,which will be more understandable. I found the validation part in the gluon edition is very wired,which still need to input the groundtruth in the gmloss to do the inference ,this is unreasonable. Therefore I rewrite the validation part as well ,but the acc is decredsed a lot .
The v4 edition is based on the caffe/tensorflow edition only,without the variable var parameter ,which can reach the best performance 0.85 val acc from scratch.
I already try some parameter strategy,like choosing the last globalaveragepooling's output as the gmloss's feature input,decreacing the weight decay, warm up strategy,modifying the number of input features ,modifying the lr decay schedule.Most of them are useless, the reason for the low acc is still from the implementation of the gmloss.
由于原作者没有在tensorflow中加入updated var的效果,参考caffe代码进行添加,但发现添加了之后val acc反而从不加的0.89下降到了0.84的水平.通过一系列的调参尝试后,如下两个方法会有较好的效果