This project implements an image captioning system using deep learning techniques. It generates textual descriptions for input images using a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).
Image captioning is a challenging task in computer vision and natural language processing, which involves generating descriptive captions for images automatically. This project utilizes pre-trained models such as VGG16 for feature extraction from images and LSTM networks for generating captions based on those features.
- Extracts image features using the VGG16 model
- Preprocesses captions data and tokenizes text using the Keras Tokenizer
- Implements a data generator to handle large datasets efficiently
- Trains a captioning model using a combination of image features and textual data
- Evaluates the model using BLEU scores for caption quality assessment
- Utilizes the OpenAI API to refine generated captions for improved grammatical correctness
- Python 3.x
- TensorFlow 2.x
- Keras
- NumPy
- NLTK
- OpenCV
- tqdm
- matplotlib