The data preparation pipeline uses the original ScanNet repo, please also refer to it.
- Use the following command to clone the original ScanNet repo.
git clone https://github.com/ScanNet/ScanNet.git
-
Download the ScanNetV2 dataset and put (or link)
scans/
under (or to)../assets/data/scannet/scans/
(Please follow the ScanNet Instructions for downloading the ScanNet dataset). -
Use the following command to downsample videos from the original ScanNet video.
cd ../utils
python ScanNetVideo.py --meta_all <META_FILE_1> --meta_test <META_FILE_2> --output_folder <OUTPUT_IMAGE_FOLDER> --scan_file_path <SCANS_FILE> --sens_path <SENS_EXEC>
python jpg2mp4.py --output <OUTPUT_FINAL_DIR> --file <IMAGES_DIR>
<META_FILE_1> should be scannetv2.txt
, <META_FILE_2> should be scannetv2_test.txt
, <OUTPUT_IMAGE_FOLDER> and <IMAGES_DIR> should be the same ../assets/data/Video_img
, <SCANS_FILE> should be the scans
folder in step 2, <SENS_EXEC> should be the sens folder in ScanNet repo, details please refer to ScanNet Video, <OUTPUT_FINAL_DIR> should be ../assets/data/Video
.
For convenience, you can download the videos generated by us from here
-
Please follow the guidance in ClipBERT to transform the
.mp4
files into.mdb
file, where.mdb
file should be saved in./data/vis_db/sqa
-
To transform the annotation into ClipBERT format, use the following command
cd ../utils
python sqa_data_2_ClipBERT.py
After doing so, you could find the annotation jsonl file in ./data/txt_db/sqa
- Use the following command to download pretrained models
bash scripts/download_pretrained.sh ./data
- Launch the Docker container for running the experiments.
source launch_container.sh ./data/txt_db ./data/vis_db ./data/finetune ./data/pretrained
- In Docker, use the following command to run experiments
python src/tasks/run_video_qa.py --config src/configs/sqa_video_base_resnet50.json --output_dir /storage
- Launch the Docker container for running the experiments.
source launch_container.sh ./data/txt_db ./data/vis_db ./data/finetune ./data/pretrained
- In Docker, use the following command to run inference on test set
python src/tasks/run_video_qa.py --config src/configs/sqa_video_base_resnet50.json --output_dir /storage --do_inference 1 --inference_split test --inference_model_name clipbert --inference_txt_db $TXT_DB --inference_img_db $IMG_DB
$TXT_DB
and $IMG_DB
are path to annotation file and video data. You can use TXT_DB=/txt/sqa/test.jsonl
and IMG_DB=/img/sqa
for inference on SQA3D test split.
- Pretrained models can be downloaded here. You should put the file into
./data/finetune/ckpt/
to test it. The correspondence between the models and the results in the paper is as followsmodels Model in the paper results clipbert.pt
ClipBERT
43.31
Please also refer to the original ClipBERT repo for more details on training and evaluation.