Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the meaning of grid in the F.grid_sample function. #62

Open
pnpmpnp opened this issue Mar 20, 2024 · 0 comments
Open

Questions about the meaning of grid in the F.grid_sample function. #62

pnpmpnp opened this issue Mar 20, 2024 · 0 comments

Comments

@pnpmpnp
Copy link

pnpmpnp commented Mar 20, 2024

First of all, thank you very much for sharing such a good and high-quality implementation.

However, there is one instance in your code that I don't quite understand.

For backward warping, I understood that the depth map obtained from the target view is moved to the coordinates of the source view, and the value of the source image is taken to the coordinates generated, resulting in a loss between the reconstructed target view and the original target view, and this is the reason for using backward warping (I also read the other comments and found them very helpful).

But what I'm wondering is what the grid used in the F.grid_sample function means in this case, i.e. there is an assumption that the image in the target view should be seen in these positions in the source view, and I think it is the GT pose that ensures this.

If that warping is moving from target to source using GT poses, what exactly do the resulting coordinates mean? Because if they are complete source coordinates, it's a little awkward to represent the target view generated when putting the source image into F.grid_sample. (warping: photometric consistency, which is often used in self-supervised monocular depth estimation, and backward warping, which is often used in mvsnet to get the source view into the target view)

I understand the coordinates generated by going to Target View -> Source View as offsets that tell the source view where the pixels need to move to correspond to the reference view.
However, I have a question about geometric filtering.


Line 153 of the reproject_with_depth function in the eval.py file is a bit confusing.
sampled_depth_src = cv2.remap(depth_src, x_src, y_src, interpolation=cv2.INTER_LINEAR)


I agree that the transformation is already done via relative pose to get from the reference view to the source view, and that source depth should be used to get from the source view back to the reference view.

However, I'm a little confused about what coordinate system the sampled_depth_src is in or what it means, as it was created by putting the depth_src back into the grid created for the existing backward warping.
As mentioned above, I understood the coordinates generated by moving the grid from the target view -> source view as offsets that tell me where in the source view I need to move pixels to correspond to the reference view. What coordinate system is the grid on line 153, and what does it mean?

I would appreciate if you could share your thoughts on that code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant