image-captioning

Image Captioning Neural Network notebook using Keras. This project aims to develop an image captioning model that can generate a descriptive caption for an input image. The model utilizes a combination of deep learning techniques within computer vision and natural language processing in order to do this. You can view the notebook for this project in image-captioning.ipynb and an overview of the architecture below.

Dataset Installation Instructions

In order to install the data, you can use the install.ipynb notebook, which will install the datasets from Kaggle, un-zip them, and put them into the data directory.

Architecture Overview

The architecture of the model consists of the following components:

MobileNetV2 CNN: We utilize a pre-trained CNN to extract relevant features from the input image, resulting in a flattened features array, which is then encoded.
Caption Input and Sequence Encoding Layers: The input caption is processed using an embedding layer to convert words into vectors. Rather than manually training this layer, we actually use the GloVe (Global Vectors for Word Representation) dataset developed by Stanford.
LSTM Layers: Next, we use both bidirectional and standard LSTM layers to capture the contextual info (and dependencies) among words within the caption. This allows the model to better the structure of the captions, resulting in far more coherent output sequences.
Combining Image and Caption Inputs: We then combine the output of the image feature encoders and the LSTM layers using an Add layer.
Decoder Layers: These encoded and processed information is then passed through a Dense layer with a ReLu activation function for further processing.
Output Layer: Finally, we use a dense layer with a softmax activation function to predict the next word in the caption.

Our model takes in two inputs: the image's features and the partial caption, generating a single output, which is the next word in the caption. By using the pre-trained MobileNet V2 CNN and LSTM layers, the model combines computer vision and nlp techniques to generate accurate captions for images.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data		data
models		models
.gitignore		.gitignore
README.md		README.md
image-captioning.ipynb		image-captioning.ipynb
install.ipynb		install.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

image-captioning

Dataset Installation Instructions

Architecture Overview

About

Releases

Packages

Languages

JetJadeja/image-captioning

Folders and files

Latest commit

History

Repository files navigation

image-captioning

Dataset Installation Instructions

Architecture Overview

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages