Skip to content

Latest commit

 

History

History
68 lines (52 loc) · 3.63 KB

README.md

File metadata and controls

68 lines (52 loc) · 3.63 KB

Running ClipBERT on SQA3D

Data preparation for egocentric videos

The data preparation pipeline uses the original ScanNet repo, please also refer to it.

  1. Use the following command to clone the original ScanNet repo.
git clone https://github.com/ScanNet/ScanNet.git
  1. Download the ScanNetV2 dataset and put (or link) scans/ under (or to) ../assets/data/scannet/scans/ (Please follow the ScanNet Instructions for downloading the ScanNet dataset).

  2. Use the following command to downsample videos from the original ScanNet video.

cd ../utils
python ScanNetVideo.py --meta_all <META_FILE_1> --meta_test <META_FILE_2> --output_folder <OUTPUT_IMAGE_FOLDER> --scan_file_path <SCANS_FILE> --sens_path <SENS_EXEC>
python jpg2mp4.py --output <OUTPUT_FINAL_DIR> --file <IMAGES_DIR>

<META_FILE_1> should be scannetv2.txt, <META_FILE_2> should be scannetv2_test.txt, <OUTPUT_IMAGE_FOLDER> and <IMAGES_DIR> should be the same ../assets/data/Video_img, <SCANS_FILE> should be the scans folder in step 2, <SENS_EXEC> should be the sens folder in ScanNet repo, details please refer to ScanNet Video, <OUTPUT_FINAL_DIR> should be ../assets/data/Video.

For convenience, you can download the videos generated by us from here

  1. Please follow the guidance in ClipBERT to transform the .mp4 files into .mdb file, where .mdb file should be saved in ./data/vis_db/sqa

  2. To transform the annotation into ClipBERT format, use the following command

cd ../utils
python sqa_data_2_ClipBERT.py

After doing so, you could find the annotation jsonl file in ./data/txt_db/sqa

  1. Use the following command to download pretrained models
bash scripts/download_pretrained.sh ./data

Training

  1. Launch the Docker container for running the experiments.
source launch_container.sh ./data/txt_db ./data/vis_db ./data/finetune ./data/pretrained
  1. In Docker, use the following command to run experiments
python src/tasks/run_video_qa.py --config src/configs/sqa_video_base_resnet50.json --output_dir /storage

Evaluation

  1. Launch the Docker container for running the experiments.
source launch_container.sh ./data/txt_db ./data/vis_db ./data/finetune ./data/pretrained
  1. In Docker, use the following command to run inference on test set
python src/tasks/run_video_qa.py --config src/configs/sqa_video_base_resnet50.json --output_dir /storage --do_inference 1 --inference_split test --inference_model_name clipbert --inference_txt_db $TXT_DB --inference_img_db $IMG_DB

$TXT_DB and $IMG_DB are path to annotation file and video data. You can use TXT_DB=/txt/sqa/test.jsonl and IMG_DB=/img/sqa for inference on SQA3D test split.

Pretrained models

  • Pretrained models can be downloaded here. You should put the file into ./data/finetune/ckpt/ to test it. The correspondence between the models and the results in the paper is as follows
    models Model in the paper results
    clipbert.pt ClipBERT 43.31

Please also refer to the original ClipBERT repo for more details on training and evaluation.