Caption to Image generation using Deep Residual Generative Adversarial Networks [DR-GAN] 🧙🏻‍♂️

The proposed GAN model is built by conditioning the generated images on a text description instead of on a class label. We implemented a Deep Residual GAN network to create fine pictures from very latent noise. The coarse images are aligned to attributes and are embedded as the generator inputs and classifier labels. A straight route, similar to the Resnet, is covered in a generative network to directly transport coarse pictures to higher layers. In addition, adversarial training is used in a cyclic fashion to prevent picture degradation. Experimental results of applying the Deep Residual GAN model to datasets BIRD CUB-200 and FLICKR 8K show its higher accuracy to the state-of-art GANs.

Steps to Run 🧾

Please refer to the READMEs in the folder Dataset, text_pkl, image_pkl, Weights and word2vec_pretrained_model to obtain the necessary data.
Images pickle file can be found in Dataset folder that was created using process_images.ipynb to resize and normalize the images and generate numpy arrays
Captions pickle file can be found in Captions folder that was created using process_captions.ipynb to generate sentence embeddings for the captions or use one provided in Captions folder
Trained model weights files can be found in the weights folder
Import the jupyter notebook CSE_676_Text2Image_final.ipynb in Google Colab and load the data.
Run code snippets in Google Colab.

Results 🚀

Below are some outputs of the DR-GAN model when the input text was given as “a yellow bird with black tail” & “a green bird with black head”.

Other results can be seen below

Epochs	Bird Generated Images	Flickr8K Generated Images
0 - 200	Click here to view	Click here to view
201 - 400	Click here to view	Click here to view
401 - 600	Click here to view	Click here to view
601 - 800	Click here to view	Click here to view
801 - 1000	Click here to view	Click here to view

References 📚

[1] Meng Wang, Huafeng Li, Fang Li: Generative Adversarial Network based on Resnet for Conditional Image Restoration, 2017. arXiv:1707.04881.
[2] Tingting Qiao, Jing Zhang, Duanqing Xu, and DachengTao. Mirrorgan: Learning text-to-image generation by redescription, 2019. arXiv:1903.05854.
[3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[4] Han Zhang, Tao Xu, Hongsheng Li, ShaotingZhang, Xiaogang Wang, Xiaolei Huang, and Dimitris Metaxas. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks, 2016. arXiv:1612.03242.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
Dataset		Dataset
Readme Images		Readme Images
Weights		Weights
image_pkl		image_pkl
text_pkl		text_pkl
word2vec_pretrained_model		word2vec_pretrained_model
.DS_Store		.DS_Store
CSE_676_Text2Image_final.ipynb		CSE_676_Text2Image_final.ipynb
Process Captions.ipynb		Process Captions.ipynb
Process_Images.ipynb		Process_Images.ipynb
Text2image_DRGAN_FinalReport.pdf		Text2image_DRGAN_FinalReport.pdf
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Caption to Image generation using Deep Residual Generative Adversarial Networks [DR-GAN] 🧙🏻‍♂️

Steps to Run 🧾

Results 🚀

References 📚

About

Releases

Packages

Contributors 3

Languages

sajmaru/Text_to_image_DRGAN

Folders and files

Latest commit

History

Repository files navigation

Caption to Image generation using Deep Residual Generative Adversarial Networks [DR-GAN] 🧙🏻‍♂️

Steps to Run 🧾

Results 🚀

References 📚

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages