This repository has been created to enhance collaboration and document the results of the project in the lecture "Neural Networks and Sequence to Sequence Learning". The aim is to implement image captioning using primarily the pytorch machine translation framework JoeyNMT. As baseline model we implement the approach of Xu et al. (2015).
You can let our best-performing model generate captions for any image here.
Just like Xu et al. (2015) we use an encoder network to retrieve features from images. The feature vector is then used to initialize a LSTM decoder, which unrolls a generated caption. For each step, an attention mechanism is applied on the feature vector. The attention mechanism is illustrated below using a real example and shows how our implementation attends to different areas of the image during unrolling.
- Make sure to install dependencies listed in requirements.txt.
- In order to work with our implementation, load the Flickr8k dataset from https://github.com/goodwillyoga/Flickr8k_dataset.
- Place the files
Flickr8k.token.txt
,Flickr8k.trainImages.txt
,Flickr8k.devImages.txt
,Flickr8k.testImages.txt
, theExpertAnnotations.txt
file as well as the folder containing all images in adata
folder in project root. - Adapt the location and name of the above mentioned files in train.py, if necessary.
- Create a .yaml file in the
param
folder in project root. You should give this file a meaningful name. Define in the file all parameters of the experiment you want to execute. An example .yaml file with explanations can be found in theparam
folder of this repository. - Start training:
python train.py modelname
. Setmodel_name
to the name given to the .yaml file containing the desired training parameters. - During training, the loss and BLEU score evaluations on the train data will be stored inside a
runs
folder, named accoring to the model name given before. These data points can easily be visualized using Tensorboard. The trained model is stored as a.pth
file in the foldersaved_models
.
- Make sure the .pth file of the model you want to evaluate exists in the
saved_models
folder. - In
eval.py
, change themodel_name
to the name of a .yaml file containing the same entries as the trainining file and additionally the entryload_model
, set to the path of the .pth file. Example:load_model: 'saved_models/best_model.pth'
- Start evaluating with
python eval.py
. Evaluation will be done using the test split and results will be put out to the console.
Our best-performing model's weights can be downloaded here.