Feature fusion of different modalities #6

shiboliu · 2025-01-07T06:05:03Z

hi,

Suppose I want to use your fusion method to fuse the image features [B * 3, 512] and text features [B, 512] extracted from open_clip. For this cross modal fusion, What changes do I need to make in your code to make it work?

Thank you for your great work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature fusion of different modalities #6

Feature fusion of different modalities #6

shiboliu commented Jan 7, 2025

Feature fusion of different modalities #6

Feature fusion of different modalities #6

Comments

shiboliu commented Jan 7, 2025