You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
I read your paper and implementation carefully. But I could not understand how the network gets the idea of camera parameters from only 2D joints. Yes, we are not interested in the exact position of the target person in the camera coordinate, but we definitely cannot know the scale factor (i.e., focal length) from the 2D joint location only.
We know the parameters of the cameras used in Human 3.6m dataset. But we cannot know from any 2D joints locations in projected images, which is only input of the proposed 3D pose estimation network. So I think this model is just tuned or over-fitted to the training dataset's camera parameters and cannot be used when we have different test data.
For a clear example, if you train with Human 3.6M and then test with Human Eva I or vice versa, you could not get good mpjpe values. Also when you use some videos where the person's size is known then you will not get the right absolute values for 3D positions.
There are many papers on 3D pose estimation:
Some (Learnable Trianglulataiton etc) are using camera parameters explicitly.
Some (TCMR, VIBE) do not, but they do not claim that they can estimate the pose locations in the physical dimension.
Another related issue is that the training is done over video from all 4 cameras. When each camera intrinsic (especially the focal length) varies then the estimation of absolute coordinates from the neural network will not be stable. I guess the tracking should be done not only the semi-supervised cases but also in fully supervised cases too.
The paper is still fine in the proposal of 1-D CNN-based filtering and estimation of the 3D pose. However, the mpjpe measure for performance is quite misleading. You should use N-MPJPE or PA-MPJPE only.
Please let me understand your position on this. If I did not understand your work, please kindly let me know what makes me confused. The evaluate() in run.py this code is related with this issue
inputs_3d[:, :, 0] = 0
and
pos_3d[:, 1:] -= pos_3d[:, :1] # Remove global offset, but keep trajectory in first position
why you donot use the absolute position and use all the relative values?
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Thanks for releasing your work publicly.
I read your paper and implementation carefully. But I could not understand how the network gets the idea of camera parameters from only 2D joints. Yes, we are not interested in the exact position of the target person in the camera coordinate, but we definitely cannot know the scale factor (i.e., focal length) from the 2D joint location only.
We know the parameters of the cameras used in Human 3.6m dataset. But we cannot know from any 2D joints locations in projected images, which is only input of the proposed 3D pose estimation network. So I think this model is just tuned or over-fitted to the training dataset's camera parameters and cannot be used when we have different test data.
For a clear example, if you train with Human 3.6M and then test with Human Eva I or vice versa, you could not get good mpjpe values. Also when you use some videos where the person's size is known then you will not get the right absolute values for 3D positions.
There are many papers on 3D pose estimation:
Some (Learnable Trianglulataiton etc) are using camera parameters explicitly.
Some (TCMR, VIBE) do not, but they do not claim that they can estimate the pose locations in the physical dimension.
Another related issue is that the training is done over video from all 4 cameras. When each camera intrinsic (especially the focal length) varies then the estimation of absolute coordinates from the neural network will not be stable. I guess the tracking should be done not only the semi-supervised cases but also in fully supervised cases too.
The paper is still fine in the proposal of 1-D CNN-based filtering and estimation of the 3D pose. However, the mpjpe measure for performance is quite misleading. You should use N-MPJPE or PA-MPJPE only.
Please let me understand your position on this. If I did not understand your work, please kindly let me know what makes me confused. The evaluate() in run.py this code is related with this issue
inputs_3d[:, :, 0] = 0
and
pos_3d[:, 1:] -= pos_3d[:, :1] # Remove global offset, but keep trajectory in first position
why you donot use the absolute position and use all the relative values?
The text was updated successfully, but these errors were encountered: