Heima: Efficient Reasoning with Hidden Thinking
This repository provides an overview of all resources for the paper "Efficient Reasoning with Hidden Thinking".
Which automotive brand does this car belong to, and what visual cues or badges indicate that?
<SUMMARY> <THINKING_OF_SUMMARY> </SUMMARY>
<CAPTION> <THINKING_OF_CAPTION> </CAPTION>
<REASONING> <THINKING_OF_REASONING> </REASONING>
<CONCLUSION> The image shows a black BMW M3 driving down a road. </CONCLUSION>
Summary:
Below is the sequence of thought used for the summary:
I will identify the car brand by examining visual cues such as logos,
color schemes, and design elements present in the image.
Caption:
The step-by-step thinking process for the caption can be described as:
The image shows a sleek, modern sports car with a black exterior.
It has a distinct logo on the side, which resembles a cross with a circle.
Reasoning:
The thinking progress for the reasoning of the given question is illustrated as follows:
The key to identifying the brand lies in the visible badge.
The badge on the front grille of the car is crucial for determining the brand.
In this image, the badge on the car is "BMW," which is a common symbol for the BMW brand.
BMW is known for its distinctive badge, and the presence of this badge confirms the brand.
- Go to
torchtune_pkg/torchtune
and install bypip install -e .
. - Go to
zero-shot-evaluation/VLMEvalKit
and install bypip install -e .
.
- Download the LLaVA-CoT-100k dataset.
- Go to
heima/scripts/
. - Set the data path in
run-1_1-... sh
andrun-1_2-... .sh
. - Run by
sh .sh
to generate the data.
- We provide the checkpoints on HuggingFace: shawnricecake/Heima.
- There are both Heima Encoder and 3 Heima Decoders for summary, caption, and reasoning, separately.
- We also provide the training code.
- Set the right checkpoint path and data path for
LLaVA-CoT
andLlama3.1-8B-Instruct
inheima/configs
from2_1... .yaml
to2_5... .yaml
. - Go to
heima/scripts/
and run withsh run-2-... .sh
. - You will get the final Heima Encoder after step 4 and 3 decoders after step 5.
- Set the checkpoint path in
zero-shot-evaluation/VLMEvalKit/configs/3-...-lora.yaml
. - Go to
zero-shot-evaluation/VLMEvalKit/
and runsh run-eval.sh
.
- Set the right checkpoint path and data path for
LLaVA-CoT
andLlama3.1-8B-Instruct
inheima/configs
in4_1... .yaml
. - Generate CoT reconstruction results by: go to
heima/scripts
and run withsh run-4_1-... .sh
. - You can split into 8 GPUs for parallel running by revise:
GPU_split_num: 0 # 0,1,2,3,4,5,6,7
GPU_total_split_num: 8
- Compute the evaluation metrics by go to
heima/scripts
and runsh run-4_2-... .sh
.
- Go to
zero-shot-evaluation/VLMEvalKit/vlmeval/inference.py
. - Uncomment 139 and run the evaluation.
- Evaluate Heima Encoder again.
python3 compute_avg_num_token.py
- Set the checkpoint path, your question, and your image in
heima/configs/5-... .yaml
. - Go to
heima/scripts/
and run withsh run-5-... .sh
.