Learnable Adaptive Margin Loss To Overcome Language Bias in Visual Question Answering

This repository contains the implementation of our model AdaArc-LM. This repository is built upon https://github.com/guoyang9/AdaVQA.

Almost all flags can be set at utils/config.py. The dataset paths, the hyperparams can be set accordingly in this file.

GPU used:

* One NVIDIA GeForce RTX 2080 Tis

Memory required:

* 4GB approximately

Prerequisites

* python==3.7.11
* nltk==3.7
* bcolz==1.2.1
* tqdm==4.62.3
* numpy==1.21.4  
* pytorch==1.10.2
* tensorboardX==2.4
* torchvision==0.11.3
* h5py==3.5.0

Dataset

Download the VQA-CP datasets from the link provided in the supplementary material.
The image features can be downloaded by following instructions from : https://github.com/hengyuan-hu/bottom-up-attention-vqa.
The pre-trained Glove features can be accessed via https://nlp.stanford.edu/projects/glove/.

After downloading the datasets, keep them in the folders set by config.py

Preprocessing

The preprocessing steps are as follows:

process questions and dump dictionary:
```
python tools/create_dictionary.py
```
process answers and question types, and generate the frequency-based margins:
```
python tools/compute_softscore.py
```

convert image features to h5:

python tools/detection_features_converter.py

Model training instruction

    python main_arcface.py --name test-VQA --gpu 0

Model evaluation instruction

    python main_arcface.py --name test-VQA --eval-only

Running this code creates a new json file (eg. abc.json), which contains test question ids and the answers predicted by the model.

Category wise evaluation instruction

python acc_per_type.py abc.json

The argument name refers to the name of the file in which the model weights will be finally stored.

Results on AdaArc and AdaArc-LM evaluated on VQA-CP v2

Model	Accuracy in %
AdaArc	57.24
+ Randomization	57.97
+Bias-injection	59.44
+Learnable margins	59.87
+Supervised Conctrastive Loss	60.41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Learnable Adaptive Margin Loss To Overcome Language Bias in Visual Question Answering

GPU used:

Memory required:

Prerequisites

Dataset

Preprocessing

Model training instruction

Model evaluation instruction

Category wise evaluation instruction

Results on AdaArc and AdaArc-LM evaluated on VQA-CP v2

Files

README.md

Latest commit

History

README.md

File metadata and controls

Learnable Adaptive Margin Loss To Overcome Language Bias in Visual Question Answering

GPU used:

Memory required:

Prerequisites

Dataset

Preprocessing

Model training instruction

Model evaluation instruction

Category wise evaluation instruction

Results on AdaArc and AdaArc-LM evaluated on VQA-CP v2