Fundamental Issue of Model in Camera Estimaiton #239

ahnHeejune · 2023-02-12T07:42:19Z

Thanks for releasing your work publicly.

I read your paper and implementation carefully. But I could not understand how the network gets the idea of camera parameters from only 2D joints. Yes, we are not interested in the exact position of the target person in the camera coordinate, but we definitely cannot know the scale factor (i.e., focal length) from the 2D joint location only.

We know the parameters of the cameras used in Human 3.6m dataset. But we cannot know from any 2D joints locations in projected images, which is only input of the proposed 3D pose estimation network. So I think this model is just tuned or over-fitted to the training dataset's camera parameters and cannot be used when we have different test data.

For a clear example, if you train with Human 3.6M and then test with Human Eva I or vice versa, you could not get good mpjpe values. Also when you use some videos where the person's size is known then you will not get the right absolute values for 3D positions.

There are many papers on 3D pose estimation:
Some (Learnable Trianglulataiton etc) are using camera parameters explicitly.
Some (TCMR, VIBE) do not, but they do not claim that they can estimate the pose locations in the physical dimension.

Another related issue is that the training is done over video from all 4 cameras. When each camera intrinsic (especially the focal length) varies then the estimation of absolute coordinates from the neural network will not be stable. I guess the tracking should be done not only the semi-supervised cases but also in fully supervised cases too.

The paper is still fine in the proposal of 1-D CNN-based filtering and estimation of the 3D pose. However, the mpjpe measure for performance is quite misleading. You should use N-MPJPE or PA-MPJPE only.

Please let me understand your position on this. If I did not understand your work, please kindly let me know what makes me confused. The evaluate() in run.py this code is related with this issue

inputs_3d[:, :, 0] = 0
and
pos_3d[:, 1:] -= pos_3d[:, :1] # Remove global offset, but keep trajectory in first position

why you donot use the absolute position and use all the relative values?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fundamental Issue of Model in Camera Estimaiton #239

Fundamental Issue of Model in Camera Estimaiton #239

ahnHeejune commented Feb 12, 2023 •

edited

Loading

Fundamental Issue of Model in Camera Estimaiton #239

Fundamental Issue of Model in Camera Estimaiton #239

Comments

ahnHeejune commented Feb 12, 2023 • edited Loading

ahnHeejune commented Feb 12, 2023 •

edited

Loading