New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

关于VisionEncoder里面的VIT: SigLIP SoViT-400m/14 #692

Open

henrycCoder opened this issue Dec 11, 2024 · 0 comments

henrycCoder commented Dec 11, 2024 •

edited

Loading

请问下技术报告里提到的VIT: SigLIP SoViT-400m/14是否有更详细的介绍, 比如:

预训练参数从哪个模型初始化而来?
为什么patch_size 为448, 在huggingface models里并没有看到关于SigLIP SoViT-400m/14-448相关的介绍? 此处的VIT是在自有数据上使用patch_size=448进行的重新微调吗?
另外, config.json里面, 以下这两部分的image_size有什么区别?
"image_size": 448
和
"vision_config": {
...
"image_size": 980,
...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment