Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[182] Calibrated Self-Rewarding Vision Language Models #201

Open
long8v opened this issue Oct 10, 2024 · 0 comments
Open

[182] Calibrated Self-Rewarding Vision Language Models #201

long8v opened this issue Oct 10, 2024 · 0 comments

Comments

@long8v
Copy link
Owner

long8v commented Oct 10, 2024

image

paper

TL;DR

  • I read this because.. : VLM self-rewarding
  • task : LVLM
  • problem : LVLM이 object hallucination이 심한데 이는 text token에 너무 attention이 실려있기 때문
  • idea : self rewarding + CLIPScore로 image relevance 두개 잘 합쳐서 이미지에 dependant 하도록 reward 주도록 하자
  • architecture : LLaVA 1.5 7B / 13B
  • objective : DPO loss
  • baseline : LLaVA, RLHF-V, VLfeedback, ...
  • data : iteration 돌면서 생성. seed는 llava-instruction 150K 데이터 중 랜덤으로 뽑은 subset 13K
  • evaluation : VLM bench(MME, SEED, LLaVA_w, MMBench, ...), VQA(SQA, VisWiz, GQA), Hall-bench(POPE, CHAIR)
  • result : VLM bench, VQA, hall-bench 모두 개선
  • contribution :
  • etc. :

Details

Preliminary

LARGE LANGUAGE MODELS CAN SELF-IMPROVE https://arxiv.org/abs/2210.11610

Proposed

image image

VLM으로 샘플들 생성하고 (beam search decoding) 각 문장별로 reward를 매기고 이 reward의 합으로 전체 시퀀스의 점수를 매김.
good / bad response를 뽑고 이걸로 DPO 학습
학습된 VLM으로 다시 샘플등 생성하고 ... 이렇게 세번 반복

Reward

Text score + image score의 합
image

$\lambda$는 하이퍼파라미터. 0.9로 셋팅

  • text score
image

$x$ : prompt
$r_i$ : ith response token
$s$ : sentence
$R_t$ : LVLM의 text decoder 부분.

재밌는건 문장만 들어가고 이미지는 안들어가고, 이전 문장도 안들어감. 논문에서는 instruction following score라고 표현

  • image score
image

CLIPScore.

Result

image
  • comparsion with other vlms
image
  • iterative 하면서 결과
image image image

ablations

image
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant