Running ClipBERT on SQA3D

Data preparation for egocentric videos

The data preparation pipeline uses the original ScanNet repo, please also refer to it.

Use the following command to clone the original ScanNet repo.

git clone https://github.com/ScanNet/ScanNet.git

Download the ScanNetV2 dataset and put (or link) scans/ under (or to) ../assets/data/scannet/scans/ (Please follow the ScanNet Instructions for downloading the ScanNet dataset).
Use the following command to downsample videos from the original ScanNet video.

cd ../utils
python ScanNetVideo.py --meta_all <META_FILE_1> --meta_test <META_FILE_2> --output_folder <OUTPUT_IMAGE_FOLDER> --scan_file_path <SCANS_FILE> --sens_path <SENS_EXEC>
python jpg2mp4.py --output <OUTPUT_FINAL_DIR> --file <IMAGES_DIR>

<META_FILE_1> should be scannetv2.txt, <META_FILE_2> should be scannetv2_test.txt, <OUTPUT_IMAGE_FOLDER> and <IMAGES_DIR> should be the same ../assets/data/Video_img, <SCANS_FILE> should be the scans folder in step 2, <SENS_EXEC> should be the sens folder in ScanNet repo, details please refer to ScanNet Video, <OUTPUT_FINAL_DIR> should be ../assets/data/Video.

For convenience, you can download the videos generated by us from here

Please follow the guidance in ClipBERT to transform the .mp4 files into .mdb file, where .mdb file should be saved in ./data/vis_db/sqa
To transform the annotation into ClipBERT format, use the following command

cd ../utils
python sqa_data_2_ClipBERT.py

After doing so, you could find the annotation jsonl file in ./data/txt_db/sqa

Use the following command to download pretrained models

bash scripts/download_pretrained.sh ./data

Training

Launch the Docker container for running the experiments.

source launch_container.sh ./data/txt_db ./data/vis_db ./data/finetune ./data/pretrained

In Docker, use the following command to run experiments

python src/tasks/run_video_qa.py --config src/configs/sqa_video_base_resnet50.json --output_dir /storage

Evaluation

Launch the Docker container for running the experiments.

source launch_container.sh ./data/txt_db ./data/vis_db ./data/finetune ./data/pretrained

In Docker, use the following command to run inference on test set

python src/tasks/run_video_qa.py --config src/configs/sqa_video_base_resnet50.json --output_dir /storage --do_inference 1 --inference_split test --inference_model_name clipbert --inference_txt_db $TXT_DB --inference_img_db $IMG_DB

$TXT_DB and $IMG_DB are path to annotation file and video data. You can use TXT_DB=/txt/sqa/test.jsonl and IMG_DB=/img/sqa for inference on SQA3D test split.

Pretrained models

Pretrained models can be downloaded here. You should put the file into ./data/finetune/ckpt/ to test it. The correspondence between the models and the results in the paper is as follows

models Model in the paper results

clipbert.pt ClipBERT 43.31

Please also refer to the original ClipBERT repo for more details on training and evaluation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Running ClipBERT on SQA3D

Data preparation for egocentric videos

Training

Evaluation

Pretrained models

Files

README.md

Latest commit

History

README.md

File metadata and controls

Running ClipBERT on SQA3D

Data preparation for egocentric videos

Training

Evaluation

Pretrained models