Visual Grounding Models

Scanrefer

Follow the Scanrefer to setup the Env. For data preparation, you need not load the datasets, only need to download the preprocessed GLoVE embeddings (~990MB) and put them under data/
Install MMScan API.
Overwrite the lib/config.py/CONF.PATH.OUTPUT to your desired output directory.

Run the following command to train Scanrefer (one GPU):

python -u scripts/train.py --use_color --epoch {10/25/50}

Run the following command to evaluate Scanrefer (one GPU):

python -u scripts/train.py --use_color --eval_only --use_checkpoint "path/to/pth"

EmbodiedScan

Follow the EmbodiedScan to setup the Env. You need not load the datasets!
Install MMScan API.

Run the following command to train Scanrefer (multiple GPU):

# Single GPU training
python tools/train.py configs/grounding/pcd_vg_mmscan.py --work-dir=path/to/save

# Multiple GPU training
python tools/train.py configs/grounding/pcd_vg_mmscan.py --work-dir=path/to/save --launcher="pytorch"

Run the following command to evaluate Scanrefer (multiple GPU):

# Single GPU testing
python tools/test.py configs/grounding/pcd_vg_mmscan.py path/to/load_pth

# Multiple GPU testing
python tools/test.py configs/grounding/pcd_vg_mmscan.py path/to/load_pth --launcher="pytorch"

Question Answering Models

LL3DA

Follow the LL3DA to setup the Env. For data preparation, you need not load the datasets, only need to:

(1) download the release pre-trained weights. and put them under ./pretrained

(2) Download the pre-processed BERT embedding weights and store them under the ./bert-base-embedding folder
Install MMScan API.
Edit the config under ./scripts/opt-1.3b/eval.mmscanqa.sh and ./scripts/opt-1.3b/tuning.mmscanqa.sh

Run the following command to train LL3DA (4 GPU):

bash scripts/opt-1.3b/tuning.mmscanqa.sh

Run the following command to evaluate LL3DA (4 GPU):
```
bash scripts/opt-1.3b/eval.mmscanqa.sh 
```
Optinal: You can use the GPT evaluator by this after getting the result. 'qa_pred_gt_val.json' will be generated under the checkpoint folder after evaluation and the tmp_path is used for temporarily storing.
```
python eval_utils/evaluate_gpt.py --file path/to/qa_pred_gt_val.json
--tmp_path path/to/tmp  --api_key your_api_key --eval_size -1
--nproc 4
```

LEO

Follow the LEO to setup the Env. For data preparation, you need not load the datasets, only need to:

(1) Download Vicuna-7B and update cfg_path in configs/llm/*.yaml

(2) Download the sft_noact.pth and store it under the ./weights folder
Install MMScan API.
Edit the config under scripts/train_tuning_mmscan.sh and scripts/test_tuning_mmscan.sh
Run the following command to train LEO (4 GPU):
```
bash scripts/train_tuning_mmscan.sh  
```
Run the following command to evaluate LEO (4 GPU):
```
bash scripts/test_tuning_mmscan.sh
```
Optinal: You can use the GPT evaluator by this after getting the result. 'test_embodied_scan_l_complete.json' will be generated under the checkpoint folder after evaluation and the tmp_path is used for temporarily storing.
```
python evaluator/GPT_eval.py --file path/to/test_embodied_scan_l_complete.json
--tmp_path path/to/tmp  --api_key your_api_key --eval_size -1
--nproc 4
```

PS : It is possible that LEO may encounter an "NaN" error in the MultiHeadAttentionSpatial module due to the training setup when training more epoches. ( no problem for 4GPU one epoch)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Visual Grounding Models

Scanrefer

EmbodiedScan

Question Answering Models

LL3DA

LEO

Files

README.md

Latest commit

History

README.md

File metadata and controls

Visual Grounding Models

Scanrefer

EmbodiedScan

Question Answering Models

LL3DA

LEO