Skip to content

KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation

Notifications You must be signed in to change notification settings

fomalhautb/KM-BART

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ea38894 · Aug 31, 2021

History

7 Commits
Aug 18, 2021
Aug 18, 2021
Aug 18, 2021
Aug 18, 2021
Aug 18, 2021
Aug 18, 2021
Aug 18, 2021
Aug 18, 2021
Aug 31, 2021
Aug 18, 2021
Aug 18, 2021
Aug 18, 2021
Aug 18, 2021
Aug 18, 2021

Repository files navigation

KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation (ACL 2021)

Yiran Xing*, Zai Shi*, Zhao Meng*, Gerhard Lakemeyer, Yunpu Ma, Roger Wattenhofer

∗The first three authors contribute equally to this work

[Paper] [Supplementary]

image

image

How to Cite Our Work

@inproceedings{KM-BART,
    title = "{KM}-{BART}: Knowledge Enhanced Multimodal {BART} for Visual Commonsense Generation",
    author = "Xing, Yiran  and
      Shi, Zai  and
      Meng, Zhao  and
      Lakemeyer, Gerhard  and
      Ma, Yunpu  and
      Wattenhofer, Roger",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    year = "2021",
    publisher = "Association for Computational Linguistics",
    pages = "525--535"
}

Installation

  1. Clone the repository recursively

    git clone --recursive https://github.com/FomalhautB/KM-BART-ACL.git
    
  2. Create conda environment

    conda env create -f environment.yaml
    

The following steps are only required for feature extraction.

  1. Install bottom-up-attention.pytorch. Please refer to bottom-up-attention.pytorch, for more details.

    cd bottom-up-attention.pytorch
    # install detectron2
    cd detectron2
    pip install -e .
    cd ..
    # install the rest modules
    python setup.py build develop
    cd ..
  2. Install comet-commonsense. Please refer to comet-commonsense for more details.

    cd comet-commonsense
    # download data
    bash scripts/setup/get_atomic_data.sh
    bash scripts/setup/get_model_files.sh
    # install dependencies
    pip install tensorflow
    pip install ftfy==5.1
    conda install -c conda-forge spacy
    python -m spacy download en
    pip install tensorboardX
    pip install tqdm
    pip install pandas
    pip install ipython

Data Preparation

VCG

  1. Download the images from here and decompress the images into $VCR_DATASET
  2. Download the annotations from here and decompress the annotations into $VCG_ANNOTATION
  3. Extract features and save the features in $VCG_DATA:
    python -m scripts.prepare_vcg \
        --data_dir $VCR_DATASET \ 
        --output_dir $VCG_DATA \
        --annot_dir $VCG_ANNOTATION \
        --gpu_num 4

COCO

  1. Download the train images from here and decompress the images into $COCO_TRAIN
  2. Download the validation images from here and decompress the images into $COCO_VAL
  3. Download the annotations from here and decompress the annotations into $COCO_ANNOTATION
  4. Extract features and save the features in $COCO_DATA:
    python -m scripts.prepare_coco \
        --train_dir $COCO_TRAIN \
        --val_dir $COCO_VAL \
        --annot_dir $COCO_ANNOTATION  \
        --output_dir $COCO_DATA \
        --gpu_num 4

SBU and CC

  1. Download the json files for image urls and captions from here and Decompress the two files into $SBU_ANNOTATION
  2. extract the features, bounding box and labels, build image annotations and save into $OUTPUT_DATA (This will download the images first and save in $SBU_DATA):
    python -m scripts.prepare_sbu \
        --download \
        --data_dir $SBU_DATA \
        --output_dir $OUTPUT_DATA \
        --annot_dir $SBU_ANNOTATION \
        --gpu_num 4 \
        --n_jobs 8

VG

  1. Download the objects, relationships, region descriptions, attributs and image meta data from here and decompress them into $VG_ANNOTATION
  2. Download the images from the same link above and decompress them into $VG_IMAGES
    python -m scripts.prepare_vg \
        --annot_dir $VG_ANNOTATION \
        --output_dir $VG_DATA \
        --data_dir $VG_IMAGES \
        --gpu_num 4 \

Reasoning (SBU and COCO)

  1. Download the pretrained weight atomic_pretrained_model.pickle of COMET from comet-commonsense
    • Save it to $LOAD_PATH.
    • Follow the instructions in comet-commonsense to make the dataloader of COMET.
  2. Download the json files for image urls and captions from here and decompress the two files into $SBU_ANNOTATION.
  3. Download the SBU dataset and save the images in $SBU_DATA and decompress the features, bounding box and labels of images and save into $SBU_DATA.
  4. Generate inferences and save the inferences in $REASON_DATA.
    python -m scripts.prepare_sbu_reason \
         --output_dir $REASON_DATA \
         --annot_dir  $SBU_ANNOTATION \
         --model_file $LOAD_PATH/COMET \
         --gpu_num 2 \
         --sampling_algorithm topk-3
    
    # rename the output file
    mv $REASON_DATA/train.json $SBU_DATA/reason_train.json
  5. Filter the newly generated inferences with a KM-BART pretrained on VCG (also in $LOAD_PATH) and save the final results in $OUTPUT_DATA.
    python -m scripts.filter_reason  \
         --data_dir $SBU_DATA \
         --output_dir $OUTPUT_DATA \
         --checkpoint $LOAD_PATH/KM-BART

Training

Pretrain from scratch

  • Example of pretraining on COCO + SBU with 1 GPU and 4 CPUs from scratch (no pretrained weights)
    python pretrain \
        --dataset coco_train $COCO_DATA \
        --dataset coco_val $COCO_DATA \
        --dataset sbu_train $SBU_DATA \
        --checkpoint_dir $CHECKPOINT_DIR \
        --gpu_num 1 \
        --batch_size 32 \
        --master_port 12345 \
        --log_dir $LOG_DIR \
        --amp \
        --num_workers 4 \
        --model_config config/pretrain_base.json

Pretrain from facebook bart-base

  • Example of loading pretrained weights from facebook bart base and train on COCO
    python pretrain \
        --dataset coco_train $COCO_DATA \
        --checkpoint_dir $CHECKPOINT_DIR \
        --model_config config/pretrain_base.json \
        --checkpoint facebook/bart-base

Continue pretraining

  • Example of loading pretrained weights from previous checkpoint and continue to train on COCO
    python pretrain \
        --dataset coco_train $COCO_DATA \
        --checkpoint_dir $CHECKPOINT_DIR \
        --model_config config/pretrain_base.json \
        --checkpoint $CHECKPOINT \
        --continue_training

Train VCG

  • Example of loading weights from pretrained checkpoint and fine tune on VCG. Validation will of loss and score will be done at the end of each epoch
    python vcg_train \
        --data_dir $VCG_DATA \
        --checkpoint_dir $CHECKPOINT_DIR \
        --validate_loss \
        --validate_score \
        --model_config config/vcg_base.json \
        --checkpoint $CHECKPOINT \

Generate and evaluate VCG

  • Example of generating sentences for VCG:

    python vcg_generate \
        --data_dir $VCG_DATA \
        --checkpoint $CHECKPOINT \
        --output_file $GENERATED_FILE \
  • Example of evaluating the generated file for VCG validation set:

    python vcg_eval \
        --generation $GENERATED_FILE \
        --reference $VCG_DATA/val_ref.json

Pretrained Weights

About

KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages