A neural network to generate captions for an image using CNN and RNN with BEAM as well as Greedy Search.
Recommended System Requirements to train model.
- A good CPU and a GPU with atleast 8GB memory
- Atleast 8GB of RAM
- Active internet connection
Required libraried -
- Numpy - 1.16.4
- Python - 3.6.7
- Keras - 2.2.4
- Tensorflow - 1.13.1
- nltk - 3.2.5
- PIL - 4.3.0
- Matplotlib - 3.0.3
- tqdm - 4.28.1
DataFile Required - Download from link
- Flickr8k_Dataset: contain images
- Flickr8k.token.txt: contain 5 caption for each token or imageID
- Flickr8k.trainImages.txt: contain imageId of train images
- Flickr8k.testImages.txt: contain imageId of test images
Model used - InceptionV3 + LSTM
In token_path, img_path, train_path, test_path & glove_path variable add
the path of Flickr8k.token.txt, Flicker8k_Dataset, Flickr_8k.trainImages.txt,
Flickr_8k.testImages.txt & glove file respectively
example
token_path = '/content/drive/MyDrive/DS303/Flickr8k.token.txt'
img_path = '/content/drive/MyDrive/DS303/Flicker8k_Dataset/'
train_path = '/content/drive/MyDrive/DS303/Flickr_8k.trainImages.txt'
test_path = '/content/drive/MyDrive/DS303/Flickr_8k.testImages.txt'
glove_path = '/content/drive/MyDrive/DS303/glove.6B.200d.txt'
then run .py file in any preferable ide to train model, and if working on notebook run all cell to train and produce sample test result.
For testing any image from the test data set -
- pick any image id of your choice from Flickr_8k.testImages.txt
- encode the image using encoding_test function
image = encoding_test[pic].reshape((1,2048))
- Now to get result
- using greddy search
greedySearch(image)
- Using Beam Search
beamSearch_predictions(image, beam_index = 3)
- Here you can find txt file of both greedy and beam search results
- the txt file contain prediction across each imageID