Built using PyTorch
Pembangkitan Deskripsi Gambar Bahasa Indonesia
Baseline paper : Semantic Compositional Networks for Visual Captioning
-
Before installing project make sure the following prerequisites have been met.
Sebelum memasang dan menjalankan proyek ini, pastikan segala kebutuhan sudah terpenuhi
-
See how caption is being generated
Lihat cara deksripsi gambar dibangkitkan
-
File and directory structure of this project
Struktur direktori dan berkas
-
Install and Running the project
Way to run and develop the project
Cara menjalankan dan mengembangkan proyek ini
-
How the model implemented and works
Cara model diimplementasikan dan cara kerja
-
Possible enhancement and development in the future
Kemungkinan peningkatan kualitas and pengembangan di masa depan
-
See man behind the project and other people that contribute to this project
Orang di belakang proyek dan orang-orang lain yang berkontribusi
What things you need to install this project and how to install them
-
Python 3.4 or More for Programming Language
-
Pytorch for Deep Neural Network Framework
-
Torchvision for ResNet152 Architecture
-
Nlg-Eval for evaluation metrics
This project used BLEU, ROUGE, METEOR, and CIDEr-D as evaluation metrics for English caption
METEOR and CIDEr-D is not used in Indonesian because there is no implementation of METEOR and CIDEr-D in Indonesian language
You can download pretrained models and scn_data from THIS LINK
Just copy the
pretrained
andscn_data
folder into this project
.
├── pretrained # download and save pretrained models here
├── scn_data # folder contains files generated by create_input_files.py
├── datasets # dataset loader for generate train, eval, test
│ ├── caption.py # caption dataset loader
│ └── tag.py # tag dataset loader
├── models # all models implementation
│ ├── decoders # all models implementation
│ │ ├── attention_scn.py # all models implementation
│ │ ├── pure_attention.py # all models implementation
│ │ └── pure_scn.py
│ ├── encoders
│ │ ├── caption.py # all models implementation
│ │ └── tagger.py
│ ├── attention.py
│ └── scn_cell.py
├── trains # all training implementation
│ ├── attention_scn.py
│ ├── pure_attention.py
│ ├── pure_scn.py
│ └── tagger.py
├── utils
│ ├── checkpoint.py
│ ├── dataset.py
│ ├── device.py
│ └── embedding.py
│ ├── loader.py
│ ├── metric.py
│ ├── optimizer.py
│ └── tensor.py
│ ├── token.py
│ ├── url.py
│ └── vizualize.py
├── corpus_score.py # corpus scoring using perplexity and vocab count
├── create_input_files.py # preprocess input files and split data
├── eval_caption.py # caption model evaluation script
├── eval_tagger.py # image tagger model evaluation script
├── inference.py # caption generator script
├── README.md # this file
└── train.py # training script
git clone https://github.com/rayandrews/semantic-compositional-nets-attention.git
cd semantic-compositional-nets-attention
By combining two architecture: SCN by zhegan27 and Soft Attention by kelvinxu
Parameters | Value |
---|---|
Semantic Concept | 1000 |
Caption Per Image | 5 |
Min Word Freq | 5 |
Max Caption Length | 50 |
Parameters | Value |
---|---|
Epoch | 10 |
Batch Size | 32 |
Learning Rate | 1e-4 |
Dropout | 0.15 |
Optimizer | Adam |
Parameters | SCN | SCN + Attention |
---|---|---|
Epoch | 12 | 12 |
Batch Size | 32 | 32 |
Learning Rate | 4e-4 | 4e-4 |
Dropout | 0.5 | 0.5 |
Optimizer | Adam | Adam |
Embedding | 512 | 512 |
Attention | - | 512 |
Factor | 512 | 512 |
Decoder | 512 | 512 |
- Change Soft Attention to Transformer Attention is All You Need
- Change baseline
- Preprocess and reevaluate Indonesian dataset