Skip to content

HITsz-TMG/ViSA

Repository files navigation

Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents

🚀 Welcome to the repo of ViSA!

ViSA (Visual-Centric Data Selection with Collaborative Agents) is an open-source project designed to enhance visual data selection through collaborative agents.

Paper

  • Model Release
  • Data Release
  • Code Release

⚡️ Installation

To ensure smooth integration with external dependencies, we recommend setting up separate virtual environments for different components of the project.

Setting up the VLLM Environment

conda create -n vllm python=3.11
conda activate vllm
pip install -r vllm_requirements.txt

Note: Due to existing bugs in the current VLLM main branch when using Qwen2-VL, we recommend using the VLLM dev branch instead.

conda create -n qwen_vllm python=3.11
conda activate qwen_vllm
pip install -r qwen_vllm_requirements.txt

Setting up the SAM2 Environment

conda create -n sam python=3.11
conda activate sam
pip install -r sam_requirements.txt

Setting up the Training Environment

We provide a simple training environment for running experiments. However, we also encourage the use of more efficient training frameworks like LLama-Factory.

pip install -r training_requirements.txt

🌈 Quick Start

📥 Model Download

We use the following large vision-language models as visual agents. Please manually download them before running the experiments:

🔗 Repo Download

We rely on the following open-source projects. Please install them according to their official guidelines:

conda activate sam

# install sam2
git clone https://github.com/facebookresearch/sam2.git && cd sam2-main
pip install -e .

# install grounded-sam2
git clone https://github.com/IDEA-Research/Grounded-SAM-2.git && cd Grounded-SAM-2-main
pip install -e .
pip install --no-build-isolation -e grounding_dino

🚀 Running Experiments

We provide five reference scripts for data selection. Before running them, please ensure that all necessary parameters (e.g., model paths, save directories) are correctly specified.

Segmentation Complexity Score (SC Score)

conda activate sam
bash Scrpit/SC_score.sh

Object Alignment Score (OA Score)

conda activate sam
bash Scrpit/OA_score.sh

Diversity Perspective Score (DP Score)

conda activate vllm # dev_vllm for qwen
bash Scrpit/DP_score.sh

Prior Token Perplexity Score (PT Score) & Image-Text Mutual Information Score (IM Score)

conda activate vllm # dev_vllm for qwen
bash Scrpit/PT_IM_score.sh

🗝️ Dataset

You can download our dataset here. We provide two versions of the data: ViSA-LlavaOV-80K and ViSA-LlavaOV-700K.

The 80K dataset can be used for small-scale multimodal model alignment or replicating the experiments in our paper, while the 700K dataset is suitable for large-scale multimodal model alignment.

Due to capacity limitations for new accounts on Huggingface, we are temporarily unable to upload data containing images. To obtain the image data, please download the original Llava-OneVision dataset.

💫 Models

Our visual-semantic alignment models based on Qwen2-VL-2B architecture are available for academic research:

  • Qwen2-VL-2B-ViSA-80K: Trained on ViSA-LlavaOV-80K dataset, specifically calibrated for reproducing experimental results in our publication.
  • Qwen2-VL-2B-Instruction-ViSA-700K: Enhanced through ViSA-LlavaOV-700K training, demonstrating superior multi-modal reasoning compared to its base instruction model.

(WIP) We will publish the detailed evaluation soon

📢 Stay Connected

For any questions, issues, or contributions, feel free to open an issue or submit a pull request.

Citation

If you find our model/code/paper helpful, please consider citing our papers 📝 and staring us ⭐️!

@article{liu2025picking,
  title={Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents},
  author={Liu, Zhenyu and Li, Yunxin and Hu, Baotian and Luo, Wenhan and Wang, Yaowei and Zhang, Min},
  journal={arXiv preprint arXiv:2502.19917},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published