[Question] Question on `image_newline` for single image #1565

SCZwangxiao · 2024-06-17T13:38:01Z

I think the image_newline here is the implementation of Row-ended tokens in the paper.

Lines 82 to 86 in c121f04

    
           if 'unpad' in mm_patch_merge_type: 
        
               embed_std = 1 / torch.sqrt(torch.tensor(self.config.hidden_size, dtype=self.dtype)) 
        
               self.image_newline = nn.Parameter( 
        
                   torch.randn(self.config.hidden_size, dtype=self.dtype) * embed_std 
        
               )

However, for single image input, the tokens are not appended to each row as expected in the paper. Specifically, only one token is appended to the flatten patch tokens of the image.

LLaVA/llava/model/llava_arch.py

Lines 191 to 196 in c121f04

    
           image_feature = image_feature[0] 
        
           if 'unpad' in mm_patch_merge_type: 
        
               image_feature = torch.cat(( 
        
                   image_feature, 
        
                   self.model.image_newline[None].to(image_feature.device) 
        
               ), dim=0)

The text was updated successfully, but these errors were encountered:

SCZwangxiao changed the title ~~[Question] What does image_newline mean in LLaVA 1.6?~~ [Question] Question on image_newline for single image Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Question on `image_newline` for single image #1565

[Question] Question on `image_newline` for single image #1565

SCZwangxiao commented Jun 17, 2024 •

edited

Loading

[Question] Question on image_newline for single image #1565

[Question] Question on image_newline for single image #1565

Comments

SCZwangxiao commented Jun 17, 2024 • edited Loading

[Question] Question on `image_newline` for single image #1565

[Question] Question on `image_newline` for single image #1565

SCZwangxiao commented Jun 17, 2024 •

edited

Loading