Skip to content

Deep Neural Network Image Captioner using visual Attention

License

Notifications You must be signed in to change notification settings

AminAlam/Image-Captioner

Repository files navigation

Image-Captioner

What is This?

It is a deep Neural network which uses CNNs, RNNs, and MLPs for captioning images. It was done for SUT DL course.

Dependencies

You can intall all the dependencies using pip install -r requirements.txt.

Architecture

I used resnet152 as CNN encoder and LSTMs as RNN decoder. You can see a schematic of the model in the image below.

image 1

Training

The model was trained for 500 epochs on Tesla k80 GPU and flickr8k. The embedding layer weights were obtained from Stanford glove.42B.300d (random value was used for the words which weren't in the glove).

Loss and Accuracy

Loss Accuracy

About

Deep Neural Network Image Captioner using visual Attention

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published