-
Follow the Scanrefer to setup the Env. For data preparation, you need not load the datasets, only need to download the preprocessed GLoVE embeddings (~990MB) and put them under
data/
-
Install MMScan API.
-
Overwrite the
lib/config.py/CONF.PATH.OUTPUT
to your desired output directory. -
Run the following command to train Scanrefer (one GPU):
python -u scripts/train.py --use_color --epoch {10/25/50}
-
Run the following command to evaluate Scanrefer (one GPU):
python -u scripts/train.py --use_color --eval_only --use_checkpoint "path/to/pth"
-
Follow the EmbodiedScan to setup the Env. You need not load the datasets!
-
Install MMScan API.
-
Run the following command to train Scanrefer (multiple GPU):
# Single GPU training python tools/train.py configs/grounding/pcd_vg_mmscan.py --work-dir=path/to/save # Multiple GPU training python tools/train.py configs/grounding/pcd_vg_mmscan.py --work-dir=path/to/save --launcher="pytorch"
-
Run the following command to evaluate Scanrefer (multiple GPU):
# Single GPU testing python tools/test.py configs/grounding/pcd_vg_mmscan.py path/to/load_pth # Multiple GPU testing python tools/test.py configs/grounding/pcd_vg_mmscan.py path/to/load_pth --launcher="pytorch"
-
Follow the LL3DA to setup the Env. For data preparation, you need not load the datasets, only need to:
(1) download the release pre-trained weights. and put them under
./pretrained
(2) Download the pre-processed BERT embedding weights and store them under the
./bert-base-embedding
folder -
Install MMScan API.
-
Edit the config under
./scripts/opt-1.3b/eval.mmscanqa.sh
and./scripts/opt-1.3b/tuning.mmscanqa.sh
-
Run the following command to train LL3DA (4 GPU):
bash scripts/opt-1.3b/tuning.mmscanqa.sh
-
Run the following command to evaluate LL3DA (4 GPU):
bash scripts/opt-1.3b/eval.mmscanqa.sh
Optinal: You can use the GPT evaluator by this after getting the result. 'qa_pred_gt_val.json' will be generated under the checkpoint folder after evaluation and the tmp_path is used for temporarily storing.
python eval_utils/evaluate_gpt.py --file path/to/qa_pred_gt_val.json --tmp_path path/to/tmp --api_key your_api_key --eval_size -1 --nproc 4
-
Follow the LEO to setup the Env. For data preparation, you need not load the datasets, only need to:
(1) Download Vicuna-7B and update cfg_path in configs/llm/*.yaml
(2) Download the sft_noact.pth and store it under the
./weights
folder -
Install MMScan API.
-
Edit the config under
scripts/train_tuning_mmscan.sh
andscripts/test_tuning_mmscan.sh
-
Run the following command to train LEO (4 GPU):
bash scripts/train_tuning_mmscan.sh
-
Run the following command to evaluate LEO (4 GPU):
bash scripts/test_tuning_mmscan.sh
Optinal: You can use the GPT evaluator by this after getting the result. 'test_embodied_scan_l_complete.json' will be generated under the checkpoint folder after evaluation and the tmp_path is used for temporarily storing.
python evaluator/GPT_eval.py --file path/to/test_embodied_scan_l_complete.json --tmp_path path/to/tmp --api_key your_api_key --eval_size -1 --nproc 4
PS : It is possible that LEO may encounter an "NaN" error in the MultiHeadAttentionSpatial module due to the training setup when training more epoches. ( no problem for 4GPU one epoch)