Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a question about ViTSTR #5

Open
Danee-wawawa opened this issue Jun 3, 2021 · 2 comments
Open

a question about ViTSTR #5

Danee-wawawa opened this issue Jun 3, 2021 · 2 comments

Comments

@Danee-wawawa
Copy link

Hi, thank you for your work. This is a very meaningful job. Regarding algorithm design, I have a question.
You convert an input image into patches firstly, if some characters are cut off or some patch contains multiple characters, will it have an impact?
Looking forward to your reply.

@roatienza
Copy link
Owner

The image is divided into non-overlapping patches. A patch may contain 0 or more character or even partial characters only.
With position embedding, the transformer is able to figure out the parts of a whole. So, it has no impact.
Not tried and something that can be experimented on: overlapping patches and smaller patches as done in DINO.

@Danee-wawawa
Copy link
Author

OK, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants