Project

We introduce the Conceptual 12M (CC12M), a dataset with ~12 million image-text pairs meant to be used for vision-and-language pre-training. It is larger and covers a much more diverse set of visual concepts than the Conceptual Captions (CC3M), a dataset that is widely used for pre-training and end-to-end training of image captioning models. Check our paper for further details.

Download

Click here to download (2.5GB)

Format (.tsv)

[image_url_1]\t[caption_1]
[image_url_2]\t[caption_2]
[image_url_3]\t[caption_3]
…
[image_url_N]\t[caption_N]

Hashcodes

Click here to download (2.12GB). Credit: Nicholas Carlini

Format (.tsv)

[image_url_1]\t[SHA256_1]\t[MD5_1]
[image_url_2]\t[SHA256_2]\t[MD5_2]
[image_url_3]\t[SHA256_3]\t[MD5_3]
…
[image_url_N]\t[SHA256_N]\t[MD5_N]

Cite

If you use this dataset in your research, please cite:

Soravit Changpinyo, Piyush Sharma, Nan Ding, Radu Soricut. Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts. CVPR 2021.

@inproceedings{changpinyo2021cc12m,
  title = {{Conceptual 12M}: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts},
  author = {Changpinyo, Soravit and Sharma, Piyush and Ding, Nan and Soricut, Radu},
  booktitle = {CVPR},
  year = {2021},
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
images		images
LICENSE		LICENSE
README.md		README.md
sata.zip		sata.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project

Download

Hashcodes

Cite

About

Releases

Packages

Languages

License

pax-d-y/Project_Data

Folders and files

Latest commit

History

Repository files navigation

Project

Download

Hashcodes

Cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages