Replies: 6 comments 3 replies
-
This model file can be used to emdedding images. https://huggingface.co/THUDM/glm-4v-9b/blob/main/visual.py |
Beta Was this translation helpful? Give feedback.
-
thanks~ I want to know if there are any specific examples of using image embedding. Since glm 4v 9b has good image understanding ability, I would like to try the effect of embedding @sixsixcoder |
Beta Was this translation helpful? Give feedback.
-
There should be a tenosr transformation for this, you can refer to this code def merge_glm_vision_embeddings(
input_ids: torch.Tensor,
inputs_embeds: torch.Tensor,
vision_embeddings: torch.Tensor,
boi_token_id: int,
eoi_token_id: int,
) -> torch.Tensor:
boi_positions = (input_ids == boi_token_id).nonzero(as_tuple=True)[0]
eoi_positions = (input_ids == eoi_token_id).nonzero(as_tuple=True)[0]
mask = torch.zeros_like(input_ids, dtype=torch.bool)
for boi_pos, eoi_pos in zip(boi_positions, eoi_positions):
assert boi_pos < eoi_pos
mask[boi_pos:eoi_pos + 1] = True
inputs_embeds[mask] = vision_embeddings.view(-1,
vision_embeddings.shape[-1])
return inputs_embeds |
Beta Was this translation helpful? Give feedback.
-
thanks a lot~ @sixsixcoder How much would the image understanding capability be reduced if the image patches were reduced to 1? I want to use this image understanding ability for image duplication detection. I don't know if it is feasible. |
Beta Was this translation helpful? Give feedback.
-
I have no prior experience with this. Utilizing just a single patch could potentially lead to the loss of image features, which would undoubtedly impact the model's comprehension of the image. Should your GPU memory be enough, it is advisable to conduct experiments using the image patches size as defined by the model. |
Beta Was this translation helpful? Give feedback.
-
Feature request / 功能建议
use glm 4v 9b to emdedding an image,Because this model has powerful image understanding
Motivation / 动机
emdedding images
Your contribution / 您的贡献
Can I use glm 4v 9b to emdedding an image? I want to know if there is such a use
Beta Was this translation helpful? Give feedback.
All reactions