Skip to content

Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.

License

Notifications You must be signed in to change notification settings

pax-d-y/Project_Data

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Project

We introduce the Conceptual 12M (CC12M), a dataset with ~12 million image-text pairs meant to be used for vision-and-language pre-training. It is larger and covers a much more diverse set of visual concepts than the Conceptual Captions (CC3M), a dataset that is widely used for pre-training and end-to-end training of image captioning models. Check our paper for further details.

Download

Click here to download (2.5GB)

Format (.tsv)

[image_url_1]\t[caption_1]
[image_url_2]\t[caption_2]
[image_url_3]\t[caption_3]
…
[image_url_N]\t[caption_N]

Hashcodes

Click here to download (2.12GB). Credit: Nicholas Carlini

Format (.tsv)

[image_url_1]\t[SHA256_1]\t[MD5_1]
[image_url_2]\t[SHA256_2]\t[MD5_2]
[image_url_3]\t[SHA256_3]\t[MD5_3]
…
[image_url_N]\t[SHA256_N]\t[MD5_N]

Cite

If you use this dataset in your research, please cite:

Soravit Changpinyo, Piyush Sharma, Nan Ding, Radu Soricut. Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts. CVPR 2021.

@inproceedings{changpinyo2021cc12m,
  title = {{Conceptual 12M}: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts},
  author = {Changpinyo, Soravit and Sharma, Piyush and Ding, Nan and Soricut, Radu},
  booktitle = {CVPR},
  year = {2021},
}

About

Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%