Neural Image Captioning Model

An implementation of Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge utilizing the pretrained ResNet152 Deep CNN model for the encoding part.

Without hyperparameter tuning and with greedy sequence generation even in training, the model performs well when supplied images from the training dataset, while it is adequate when giving images it has not seen before. Better preprocessing and hyperparameter tuning could make the model a lot better than the supplied pretrained one. Using sampling instead of argmax in the training sequence generation should also yield better results, along with Beam Search for inference.

The dataset used was downloaded from here. Using another dataset, such as the MS COCO as specified in the paper, should also give better generalization to unseen data.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dataset		dataset
pretrained		pretrained
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml
hypers.py		hypers.py
inference.py		inference.py
models.py		models.py
pyproject.toml		pyproject.toml
train-run-2022-11-18 15_04_01.log		train-run-2022-11-18 15_04_01.log
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Image Captioning Model

About

Releases

Packages

Languages

License

AlexGiazitzis/Neural-Image-Captioning-Model

Folders and files

Latest commit

History

Repository files navigation

Neural Image Captioning Model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages