Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Question on image_newline for single image #1565

Open
SCZwangxiao opened this issue Jun 17, 2024 · 0 comments
Open

[Question] Question on image_newline for single image #1565

SCZwangxiao opened this issue Jun 17, 2024 · 0 comments

Comments

@SCZwangxiao
Copy link

SCZwangxiao commented Jun 17, 2024

I think the image_newline here is the implementation of Row-ended tokens in the paper.

if 'unpad' in mm_patch_merge_type:
embed_std = 1 / torch.sqrt(torch.tensor(self.config.hidden_size, dtype=self.dtype))
self.image_newline = nn.Parameter(
torch.randn(self.config.hidden_size, dtype=self.dtype) * embed_std
)

However, for single image input, the tokens are not appended to each row as expected in the paper. Specifically, only one token is appended to the flatten patch tokens of the image.

image_feature = image_feature[0]
if 'unpad' in mm_patch_merge_type:
image_feature = torch.cat((
image_feature,
self.model.image_newline[None].to(image_feature.device)
), dim=0)

@SCZwangxiao SCZwangxiao changed the title [Question] What does image_newline mean in LLaVA 1.6? [Question] Question on image_newline for single image Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant