Skip to content

Indonesian Image Captioning using Attention-based Semantic Compositional Networks

Notifications You must be signed in to change notification settings

rayandrew/indonesian-image-captioning

Repository files navigation

Indonesian Image Captioning

Built using PyTorch

Pembangkitan Deskripsi Gambar Bahasa Indonesia

Paper

Coming soon

Baseline paper : Semantic Compositional Networks for Visual Captioning

Overview

  1. Install prerequisites

    Before installing project make sure the following prerequisites have been met.

    Sebelum memasang dan menjalankan proyek ini, pastikan segala kebutuhan sudah terpenuhi

  2. Examples

    See how caption is being generated

    Lihat cara deksripsi gambar dibangkitkan

  3. Project Tree

    File and directory structure of this project

    Struktur direktori dan berkas

  4. Install and Running the project

    Way to run and develop the project

    Cara menjalankan dan mengembangkan proyek ini

  5. How it works

    How the model implemented and works

    Cara model diimplementasikan dan cara kerja

  6. Possible Future Development

    Possible enhancement and development in the future

    Kemungkinan peningkatan kualitas and pengembangan di masa depan

  7. Author and Credits

    See man behind the project and other people that contribute to this project

    Orang di belakang proyek dan orang-orang lain yang berkontribusi


Prerequisites

What things you need to install this project and how to install them

Library

This project used BLEU, ROUGE, METEOR, and CIDEr-D as evaluation metrics for English caption

METEOR and CIDEr-D is not used in Indonesian because there is no implementation of METEOR and CIDEr-D in Indonesian language

Pretrained Model

You can download pretrained models and scn_data from THIS LINK

Just copy the pretrained and scn_data folder into this project


Examples

example-1 example-2

Project tree

.
├── pretrained # download and save pretrained models here
├── scn_data   # folder contains files generated by create_input_files.py
├── datasets   # dataset loader for generate train, eval, test
│   ├── caption.py # caption dataset loader
│   └── tag.py # tag dataset loader
├── models # all models implementation
│   ├── decoders # all models implementation
│   │   ├── attention_scn.py # all models implementation
│   │   ├── pure_attention.py # all models implementation
│   │   └── pure_scn.py
│   ├── encoders
│   │   ├── caption.py # all models implementation
│   │   └── tagger.py
│   ├── attention.py
│   └── scn_cell.py
├── trains # all training implementation
│   ├── attention_scn.py
│   ├── pure_attention.py
│   ├── pure_scn.py
│   └── tagger.py
├── utils
│   ├── checkpoint.py
│   ├── dataset.py
│   ├── device.py
│   └── embedding.py
│   ├── loader.py
│   ├── metric.py
│   ├── optimizer.py
│   └── tensor.py
│   ├── token.py
│   ├── url.py
│   └── vizualize.py
├── corpus_score.py # corpus scoring using perplexity and vocab count
├── create_input_files.py  # preprocess input files and split data
├── eval_caption.py # caption model evaluation script
├── eval_tagger.py # image tagger model evaluation script
├── inference.py # caption generator script
├── README.md  # this file
└── train.py # training script

Install and Running the project

git clone https://github.com/rayandrews/semantic-compositional-nets-attention.git
cd semantic-compositional-nets-attention

How It Works

By combining two architecture: SCN by zhegan27 and Soft Attention by kelvinxu

Architecture

architecture

Params

Default Params
Parameters Value
Semantic Concept 1000
Caption Per Image 5
Min Word Freq 5
Max Caption Length 50
Image Tagger
Parameters Value
Epoch 10
Batch Size 32
Learning Rate 1e-4
Dropout 0.15
Optimizer Adam
Caption Model
Parameters SCN SCN + Attention
Epoch 12 12
Batch Size 32 32
Learning Rate 4e-4 4e-4
Dropout 0.5 0.5
Optimizer Adam Adam
Embedding 512 512
Attention - 512
Factor 512 512
Decoder 512 512

Possible Future Development

  • Change Soft Attention to Transformer Attention is All You Need
  • Change baseline
  • Preprocess and reevaluate Indonesian dataset

Author

Credits

Supervisors

Others

  • zhegan27 as the base implementation for SCN Paper

  • kelvinxu as the base implementation of Attention Networks ~ Show, Attend, and Tell

  • sgrvinod as the base of this project with Show Attend and Tell Implementation are taken from him.

About

Indonesian Image Captioning using Attention-based Semantic Compositional Networks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published