First, create a new conda environment. We suggest you to install pytorch 1.8.
conda create -n obj2seq python==3.7
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
Then, clone the repository locally and install dependencies:
git clone
pip install -r requirements.txt
Compile MultiScaleDeformableAttention from Deformable-DETR. (If you have complied it in other repository, please ignore this step.)
cd models/ops
bash ./
Link path to coco2017 to data/coco
mkdir data
ln -s /path/to/coco data/coco
or modify data path in config files
coco_path: /path/to/coco
anno_train: /path/to/coco_train_json_file
anno_val: /path/to/coco_val_json_file
We provide CLIP-initialied class prompts here. If prompts for other sets of categories are required, please follow this section.
Prepare COCO-like json file.
Run the command below to generate class prompts embeddings.
python scripts/ --ann JSON_FILE_PATH --out_path OUTPUT_PATH
For example
python scripts/ \
--ann data/coco/annotations/instances_val2017.json \
--out_path word_arrays/coco_clip_v2.npy
To train with slurm on multiple nodes:
bash scripts/ NUM_NODES /path/to/config /path/to/output/dir [OTHER_ARGS]
For example, to train Obj2Seq on 2 nodes:
bash scripts/ 2 configs/detection_r50.yaml checkpoints/detection_r50
We also provide scripts for pytorch distributed training:
bash /path/to/config /path/to/output/dir
Before running, you may need to modify DATA.batch_size
(number of images on each GPU) in config according to your GPUs.
To evaluate Obj2Seq on a single node with 8 GPUs:
bash /path/to/config /path/to/output/dir --eval [--resume /path/to/checkpoint.pth]