ShortV

Code release for "ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers"

Install

Clone this repository and navigate to ShortV folder

git clone https://github.com/icip-cas/ShortV.git
cd ShortV

Install Package

conda create -n shortv python=3.10 -y
conda activate shortv
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install additional packages for evaluation with lmms-eval

cd lmms-eval
pip install -e .

ShortV Inference and Evaluation

Replaced Layers

The layer ids of replaced layers are provided below.

Model	Checkpoint	Replaced Layers
LLaVA-1.5-7B	liuhaotian/llava-v1.5-7b	31,29,30,28,0,26,27,25,24,22,23,21,2,3,20,18,17,12,19
LLaVA-1.5-13B	liuhaotian/llava-v1.5-13b	39,32,28,36,27,37,29,30,1,38,25,31,2,26,23,34,0,33,35,22,24,21,20,17
LLaVA-NeXT-7B	liuhaotian/llava-v1.6-vicuna-7b	31,29,30,28,26,27,22,24,21,23,25,20,19,17,18,15,12,0,2
LLaVA-NeXT-13B	liuhaotian/llava-v1.6-vicuna-13b	39,32,29,36,27,30,37,23,25,31,26,2,28,22,33,35,34,24,38,21,20,18,1,17

Chatbot Inference

Chat about images using ShortV.

export REPLACED_LAYERS="31,29,30,28,0,26,27,25,24,22,23,21,2,3,20,18,17,12,19"
python -m llava.serve.cli \
    --model-path liuhaotian/llava-v1.5-7b   \
    --image-file "https://llava-vl.github.io/static/images/view.jpg"

Evaluation with LMMs-Eval

LMMs-Eval is an evaluation framework meticulously crafted for consistent and efficient evaluation of LMM.

export MODEL_PATH="liuhaotian/llava-v1.5-7b"
export MODEL_NAME="llava_7b"
export CONV_MODE="v1"
export REPLACED_LAYERS="31,29,30,28,0,26,27,25,24,22,23,21,2,3,20,18,17,12,19"
accelerate launch  --num_processes=1 --main_process_port=12346 -m lmms_eval \
    --model llava \
    --model_args pretrained=${MODEL_PATH},conv_template=${CONV_MODE}  \
    --tasks mmmu_val \
    --batch_size 1 \
    --log_samples_suffix ${MODEL_NAME} \
    --output_path ./logs/

Evaluation with Scripts From LLaVA

See Evaluation.md.

Calculating LC Scores and Identifying Ineffective Layers

To identify which layers are ineffective, we calculate visual LC scores for all MLLM layers.

cd lmms-eval
export MODEL_PATH="liuhaotian/llava-v1.5-7b"
export MODEL_NAME="llava_7b"
export CONV_MODE="v1"
accelerate launch  --num_processes=1 --main_process_port=12346 -m lmms_eval \
    --model llava \
    --model_args pretrained=${MODEL_PATH},conv_template=${CONV_MODE}  \
    --tasks gqa,flickr30k_test \
    --batch_size 1 \
    --log_samples_suffix ${MODEL_NAME} \
    --output_path ./logs/ \
    --limit 20 \
    --cal_lc

You will get visual LC scores for each layer, and the order of layer replacement.

Acknowledge

This work is built upon the LLaVA, lmms-eval, and VTW

Citation

If you find ShortV useful for your research and applications, please cite using this BibTeX:

@article{yuan2025shortv,
  title={ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers},
  author={Yuan, Qianhao and Zhang, Qingyu and Liu, Yanjiang and Chen, Jiawei and Lu, Yaojie and Lin, Hongyu and Zheng, Jia and Han, Xianpei and Sun, Le},
  journal={arXiv preprint arXiv:2504.00502},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
docs		docs
images		images
llava		llava
lmms-eval		lmms-eval
playground/data		playground/data
scripts		scripts
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ShortV

Install

ShortV Inference and Evaluation

Replaced Layers

Chatbot Inference

Evaluation with LMMs-Eval

Evaluation with Scripts From LLaVA

Calculating LC Scores and Identifying Ineffective Layers

Acknowledge

Citation

About

Releases

Packages

Languages

icip-cas/ShortV

Folders and files

Latest commit

History

Repository files navigation

ShortV

Install

ShortV Inference and Evaluation

Replaced Layers

Chatbot Inference

Evaluation with LMMs-Eval

Evaluation with Scripts From LLaVA

Calculating LC Scores and Identifying Ineffective Layers

Acknowledge

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages