Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please compare with regression network #26

Open
makecent opened this issue Jun 24, 2020 · 3 comments
Open

Please compare with regression network #26

makecent opened this issue Jun 24, 2020 · 3 comments

Comments

@makecent
Copy link

According to the design of your CORAL framework, it is clear that the output in the penultimate layer, which has only 1 node, is proportional to the age of the input image. i.e. The output in the penultimate layer increases monotonically as the age of the input image. Therefore if the last layer (ordinal regression layer) is removed, this framework becomes a regression network that outputs a number proportional to age the input image. This is just what a regression model does. Hence your framework can be regarded as a regression network plus an ordinal output layer. The advantage is that the outputs of your regression network are in-consistent to the age label.
All in all, it would be great if you compare your framework with a single regression network without the ordinal layer, i.e., remove the last layer of your framework and let the penultimate layer output the age directly. In this case, the outputs are consistent with the age label of course.

@rasbt
Copy link
Member

rasbt commented Jun 24, 2020

Thanks for the feedback! I would add that compared to a metric regression network, the loss function is very different, and it is used to update the whole network, not just the last layer -- so, in that respect, the learned model would be very different.

In any case, it would maybe be interesting to do that comparison. I can add it to my todo list for later this summer. Btw. the Ordinal CNN by Niu et al., to which we compare CORAL, does have a comparison to metric regression.

  • Niu, Z., Zhou, M., Wang, L., Gao, X., & Hua, G. (2016). Ordinal regression with multiple output cnn for age estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4920-4928):

Screen Shot 2020-06-24 at 12 26 50 PM

(*MR-CNN means metric regression CNN).

Since the OR-CNN outperforms the same architecture with a metric regression layer and loss, and CORAL-CNN outperforms OR-CNN, I would expect that CORAL-CNN outperforms the MR-CNN by an even wider margin. I can run the experiments some time when my machines are free to report the actual comparison.
 

@makecent
Copy link
Author

makecent commented Jun 25, 2020

Thanks for the response. The regression network I mean here is a model that has the same architecture with your model but only the last layer is dropped, and use the penultimate layer of your proposed model as the output layer to predict age directly.

The reason I emphasize to compare with such a regression network is that your network is just doing a similar thing. This comparison can be regarded as an ablation experiment to prove your proposed ordinal layer is useful.

Here is the reason why I think the subnetwork of your model before the last layer is just doing a regression:
The last three layers of your proposed network are:
FC1, which has m nodes a1 to am
FC2, which has only 1 node
FC3, which is an ordinal layer and has K-1 nodes b1 to bK-1
The connections between FC1 and FC2 contain only weight but without bias.
The connections between FC2 and FC3 contain only bias but without trainable weights (all equal to 1).
Consider an input image with age n, we can get the output of FC2, let's name it Qn. If the age estimation for this image is correct, we have Qn - bn+1 < 0.5. Similarly, for input image with age n+1, we can get Qn+1 - bn+1 > 0.5. Therefore Qn+1 > Qn.
This means that if your model works well, the output of the FC2 is always proportional to the age of the input image so it is a regression network.

In other words, if you manually fix those biases in the last layer using the formula: bn = K - n. Your network becomes a regression network that predicts the age of input image, plus an ordinal encoder that transfers the age value to binary labels (Eq. 5). The major contribution of your proposed network, as I see, is to solve the in-consistence among ages because the trained bias in the last layer of your network may not follow the formula bn = K-n but always keep monotonical.

@rasbt
Copy link
Member

rasbt commented Jun 29, 2020

Thanks for sharing your detailed thoughts! I agree that this might be an interesting experiment to do. Like I said in the previous comment, I wasn't too keen on adding it in the beginning, because Niu et al. showed that it was worse then their method on these datasets (and ours worked better than Niu et al's). It's something that I can run later this summer though.

I agree with you that the architecture is very similar, and it can be seen as a regression network with (learned) biases on top. It's just that it has a different loss function though. I see it like with linear regression vs logistic regression (for a binary target) for example. I.e., the number of parameter may be the same, yet the loss function and consequently the parametrization is different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants