Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The inference about LAPS #1

Open
XpracticeYSKM opened this issue Jul 18, 2024 · 1 comment
Open

The inference about LAPS #1

XpracticeYSKM opened this issue Jul 18, 2024 · 1 comment

Comments

@XpracticeYSKM
Copy link

XpracticeYSKM commented Jul 18, 2024

Thanks for your awesome work!

I wonder the patch selection and calibration will be conducted during inference?

In other words,the inference is consistent with training process where we need to apply LAPS for vision encoder and split sentence into textual words and then conduct patch-word alignment rather than image-sentence alignment?

@darkpromise98
Copy link
Collaborator

Thanks for your attention, I am happy to answer your questions.

Question 1: Yes, the patch selection and calibration will are conducted in the inference stage.

Question 2: Yes, the inference process is consistent with training process, where we need to compute the patch-word alignment rather than image-sentence alignment. Since LAPS is a fine-grained alignment framework and different from the previous coarse-grained work (e.g., CLIP and series work).

Besides, you could use LAPS to compute patch-word alignment at training stage, and then compute the image-sentence alignment in the inference stage. In this way, the performance will drop slightly, but the computational efficiency will be greatly improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants