Multi-Modal-CelebA-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA dataset by following CelebA-HQ. Each image has high-quality segmentation mask, sketch, descriptive text, and image with transparent background.
Multi-Modal-CelebA-HQ can be used to train and evaluate algorithms of text-to-image generation, text-guided image manipulation, sketch-to-image generation, image caption, and VQA. This dataset is proposed and used in TediGAN.
- The textual descriptions are generated using probabilistic context-free grammar (PCFG) based on the given attributes. We create ten unique single sentence descriptions per image to obtain more training data following the format of the popular CUB dataset and COCO dataset. The previous study proposed CelebTD-HQ, but it is not publicly available.
- For label, we use CelebAMask-HQ dataset, which contains manually-annotated semantic mask of facial attributes corresponding to CelebA-HQ.
- For sketches, we follow the same data generation pipeline as in DeepFaceDrawing. We first apply Photocopy filter in Photoshop to extract edges, which preserves facial details and introduces excessive noise, then apply the sketch-simplification to get edge maps resembling hand-drawn sketches.
- For background removing, we use an open-source tool Rembg and a commercial software removebg. Different backgrounds can be further added using image composition or harmonization methods like DoveNet.
All data is hosted on Google Drive:
Path | Size | Files | Format | Description |
---|---|---|---|---|
multi-modal-celeba | ~200 GB | 420,002 | Main folder | |
├ image | ~2 GB | 30,000 | JPG | images from celeba-hq of size 512×512 |
├ label | ~1 GB | 30,000 | PNG | masks from celeba-mask-hq of size 512×512 |
├ sketch | 398 MB | 30,000 | PNG | sketches (10 samples and sketch.zip) |
├ text | 11 MB | 30,0000 | TXT | 10 descriptions of each image in celeba-mask-hq |
├ train | 347 KB | 1 | PKL | filenames of training images |
├ test | 81 KB | 1 | PKL | filenames of test images |
└ rmebg | ~20 GB | 30,000 | PNG | image with transparent background (password: 3amt) |
- Google Drive: downloading link
- Baidu Drive: downloading link (password: y5w4)
We provide the pretrained models of AttnGAN, ControlGAN, DMGAN, DFGAN, and ManiGAN. Feel free to pull requests if you have any updates.
Method | FID | LPIPIS | Download |
---|---|---|---|
AttnGAN | 125.98 | 0.512 | Pretrained |
ControlGAN | 116.32 | 0.522 | Pretrained |
DFGAN | 137.60 | 0.581 | Pretrained |
DM-GAN | 131.05 | 0.544 | Pretrained |
TediGAN | 106.37 | 0.456 | Pretrained |
- CelebA dataset:
Ziwei Liu, Ping Luo, Xiaogang Wang and Xiaoou Tang, "Deep Learning Face Attributes in the Wild", in IEEE International Conference on Computer Vision (ICCV), 2015 - CelebA-HQ was collected from CelebA and further post-processed by the following paper :
Karras et. al., "Progressive Growing of GANs for Improved Quality, Stability, and Variation", in Internation Conference on Reoresentation Learning (ICLR), 2018 - CelebAMask-HQ manually-annotated masks with the size of 512 x 512 and 19 classes including all facial components and accessories such as skin, nose, eyes, eyebrows, ears, mouth, lip, hair, hat, eyeglass, earring, necklace, neck, and cloth. It was collected by the following paper :
Lee et. al., "MaskGAN: Towards Diverse and Interactive Facial Image Manipulation", in Computer Vision and Pattern Recognition (CVPR), 2020
- upload image with transparent background
- remove the background of each image (release the first version at Nov.14, 2020)
- create the 3D model for each image
- upload the inverted codes
- The Multi-Modal-CelebA-HQ dataset is available for non-commercial research purposes only.
- You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the images and any portion of derived data.
- You agree not to further copy, publish or distribute any portion of the CelebAMask-HQ dataset. Except, for internal use at a single site within the same organization it is allowed to make copies of the dataset.
The use of this software is RESTRICTED to non-commercial research and educational purposes.
If you find this dataset helpful for your research, please consider to cite:
@inproceedings{xia2021tedigan,
title={TediGAN: Text-Guided Diverse Face Image Generation and Manipulation},
author={Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao and Wu, Baoyuan},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2021}
}
@article{xia2021open,
title={Towards Open-World Text-Guided Face Image Generation and Manipulation},
author={Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao and Wu, Baoyuan},
journal={arxiv preprint arxiv: 2104.08910},
year={2021}
}
@inproceedings{karras2017progressive,
title={Progressive growing of gans for improved quality, stability, and variation},
author={Karras, Tero and Aila, Timo and Laine, Samuli and Lehtinen, Jaakko},
journal={International Conference on Learning Representations (ICLR)},
year={2018}
}
@inproceedings{liu2015faceattributes,
title = {Deep Learning Face Attributes in the Wild},
author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
booktitle = {Proceedings of International Conference on Computer Vision (ICCV)},
year = {2015}
}
If you use the labels, please cite:
@inproceedings{CelebAMask-HQ,
title={MaskGAN: Towards Diverse and Interactive Facial Image Manipulation},
author={Lee, Cheng-Han and Liu, Ziwei and Wu, Lingyun and Luo, Ping},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2020}
}