Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can large-scale pretraining achieve real open-vocabulary? 预训练目标检测器能实现真正的开放词汇吗? #484

Open
wangzishuo029 opened this issue Sep 7, 2024 · 1 comment

Comments

@wangzishuo029
Copy link

Recent works like YOLO-World and GroundingDINO mainly use Object365 and GoldG for pretraining. These methods do not use CLIP image encoder as the backbone (unlike some open-vocabulary detection methods like CORA and F-VLM). But the vocabulary of O365 dataset is still limited. So can YOLO-World detect objects beyond the pretraining data? Is YOLO-World a really open-vocabulary detector?

最近的开集目标检测器(例如YOLO-World和GroundingDINO)主要是在Object365和GoldG等大规模数据集上预训练。这些方法没有采用CLIP的图像编码器作为backbone,而一些开放词汇目标检测(OVD)方法例如CORA、F-VLM是直接采用的CLIP图像编码器作为backbone。而YOLO-World的预训练数据的词汇虽然更大,但也是有限的。所以YOLO-World能够检测到预训练数据之外的对象吗?是真正的开放词汇目标检测器吗?

@YonghaoHe
Copy link

No, the performance is limited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants