Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[180] Phantom of Latent for Large Language and Vision Models #199

Open
long8v opened this issue Sep 30, 2024 · 0 comments
Open

[180] Phantom of Latent for Large Language and Vision Models #199

long8v opened this issue Sep 30, 2024 · 0 comments

Comments

@long8v
Copy link
Owner

long8v commented Sep 30, 2024

image

paper, code, dataset

TL;DR

  • I read this because.. : 최신 LVLM. 높은 성능
  • task : LVLM
  • problem : efficient LVLM
  • idea : [sos] token의 representation을 사용하여 중간 dimension을 높였다가 낮춤. DPO-like한 phantom optimization 제안
  • input/output : image, question -> answer
  • architecture : VE(Intern-ViT 300M), Projector MLP, LLM(Qwen2-0.5B, InternLM2-1.8B, Phi-3-mini-3.8B, InternLM2.5-7B)
  • objective : SFT loss + SimPO
  • baseline : closed and open LVLM models
  • data : ShareGPT4o-Images(57K), ShareGPT4V(755K), ALLaVA-VFLAN/Text(548K), MiniGemini(DocVQA, ChartQA, DVQA, AI2D), Science and Mathematical Reasoning(SMR -- Arxiv-QA, TextBookQA), GLLaVA, MathVision, MathInstruct, MathPlus
  • evaluation : Science QA, AI2D, ChartQA, SEED, POPE, HallB, MME, MathVista, MMB, MM-Vet, LLaVA-w
  • result : 비슷한 스케일의 모델 중에 좋은 성능
  • contribution :
  • etc. :

Details

proposed

  • Phantom Dimension
image
  • Phantom Optimization
image

SimplePO objective와 아예 같은듯?

image

{question, chosen, rejected} triplet은 GPT4o-mini로 생성 뒤 GPT4-o로 validate
e.g.
image

image

result

image image

ChartQA,

data links

https://github.com/ByungKwanLee/Phantom/tree/master?tab=readme-ov-file#-download-training-datasets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant