Skip to content
forked from IIGROUP/MIRTT

[EMNLP 2021] MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering

Notifications You must be signed in to change notification settings

wangjf8090/MIRTT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIRTT

This repository is the implementation of MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering.

Data Source

VQA2.0, GQA: LXMERT

visual7w, TDIUC: CTI

VQA1.0: VQA web

Pretrain

Under ./pretrain:

bash run.bash exp_name gpuid

Some parameters can be changed in run.bash.

MC VQA

Under ./main:

bash run.bash exp_name gpuid

FFOE VQA

Two stage workflow

Stage one: bilinear model (BAN, SAN, MLP)

Under ./bilinear_method:

bash run.bash exp_name gpuid mod dataset model

After training, we can generate answer list for each dataset. In this way, we simplify FFOE VQA into MC VQA.

Stage two: MIRTT. Under ./main


keep updating

About

[EMNLP 2021] MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.6%
  • Shell 0.4%