Image Caption Generator

A neural network to generate captions for an image using CNN and RNN with BEAM as well as Greedy Search.

Content ->

1. Requirements

Recommended System Requirements to train model.

A good CPU and a GPU with atleast 8GB memory
Atleast 8GB of RAM
Active internet connection

2. Installation

Required libraried -

Numpy - 1.16.4
Python - 3.6.7
Keras - 2.2.4
Tensorflow - 1.13.1
nltk - 3.2.5
PIL - 4.3.0
Matplotlib - 3.0.3
tqdm - 4.28.1

DataFile Required - Download from link

Flickr8k_Dataset: contain images
Flickr8k.token.txt: contain 5 caption for each token or imageID
Flickr8k.trainImages.txt: contain imageId of train images
Flickr8k.testImages.txt: contain imageId of test images

3. Generated Captions on Test Images

Model used - InceptionV3 + LSTM

Image

Caption

Greedy: a football player in a red jersey is tackling another player in white who is tackling the ball.
BEAM Search, k=3: a football player in a red jersey is tackling another player in red who is running with the ball whilst fans watch.
BEAM Search, k=5: three football players are tackling a football player in a red and white uniform.
BEAM Search, k=7: an american footballer in a red and white uniform gets ready to tackle an opposing player.
BEAM Search, k=10: an american footballer in a red and white uniform gets ready to tackle an opposing player while fans watch.

Greedy: a man in a red shirt climbing a rock.
BEAM Search, k=3: a man in a red shirt climbing a rock.
BEAM Search, k=5: a man climbing a rock.
BEAM Search, k=7: a man climbing a rock.
BEAM Search, k=10: a rock climber scales a steep rock cliff.

4. Procedure to Train Model

In token_path, img_path, train_path, test_path & glove_path variable add
the path of Flickr8k.token.txt, Flicker8k_Dataset, Flickr_8k.trainImages.txt,
Flickr_8k.testImages.txt & glove file respectively

example

token_path = '/content/drive/MyDrive/DS303/Flickr8k.token.txt'
img_path   = '/content/drive/MyDrive/DS303/Flicker8k_Dataset/'
train_path = '/content/drive/MyDrive/DS303/Flickr_8k.trainImages.txt'
test_path  = '/content/drive/MyDrive/DS303/Flickr_8k.testImages.txt'
glove_path = '/content/drive/MyDrive/DS303/glove.6B.200d.txt'

then run .py file in any preferable ide to train model, and if working on notebook run all cell to train and produce sample test result.

5. Procedure to Test on images

For testing any image from the test data set -

pick any image id of your choice from Flickr_8k.testImages.txt
encode the image using encoding_test function

image = encoding_test[pic].reshape((1,2048))

Now to get result

using greddy search

greedySearch(image)

Using Beam Search

beamSearch_predictions(image, beam_index = 3)

6. To View Result access the given link below

Link

Here you can find txt file of both greedy and beam search results
the txt file contain prediction across each imageID

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Image Caption Generator

Content ->

1. Requirements

2. Installation

3. Generated Captions on Test Images

4. Procedure to Train Model

5. Procedure to Test on images

6. To View Result access the given link below

Files

README.md

Latest commit

History

README.md

File metadata and controls

Image Caption Generator

Content ->

1. Requirements

2. Installation

3. Generated Captions on Test Images

4. Procedure to Train Model

5. Procedure to Test on images

6. To View Result access the given link below