Motivated by the tendency and inspiration of NLP and object detection techniques, we implemented an atten-tion-based model that can describe the image content. To achieve this, we applied CNN as the encoder and RNN with self-attention as the decoder. We also visualize how our model fixes its gaze on salient objects when creating the caption corresponding to the image. We comparative-ly validated the use of attention with state-of-the-art per-formance on benchmark datasets: MS COCO. Finally, we successfully implemented the improvements in image captioning and minimized the test loss of the training model to 11.24%.
check source code in the file
dLFinalProjCodeBase.ipynb
##Licence Distributed under the MIT License.