CXR VL research group

We're a group of doctoral students from KAIST's AI Graduate School, and we're all about multi-modal (vision-language) research in the medical field. Our aim is to continuously expand our knowledge and experience beyond traditional boundaries by deeply analyzing the essence of AI and the unique characteristics of the medical domain.

Every Thursday, we get together to review papers on the multi-modal research conducted in both general and medical fields, actively exploring the endless possibilities of AI through analysis and discussion. If you're interested in our study, especially if you have a background in medical or AI fields, we'd love for you to join us and grow together. (contact: [email protected])

KAIST AI 대학원의 박사과정 학생으로 구성된 우리 그룹은 의료 분야의 멀티모달(시각-언어) 연구에 전념하고 있습니다. 인공지능의 본질과 의료 도메인의 특성을 깊이 연구하면서, 기존의 경계를 초월하여 우리의 지식과 경험을 지속적으로 확장하고자 합니다.

우리는 일반 분야와 의료 분야에서 진행되는 멀티모달 연구의 논문을 매주 선정하여 리뷰하며, 분석과 토론을 통해 인공지능의 끊임없는 가능성을 적극 탐구하고 있습니다. 우리 그룹에 참여해 함께 성장할 분은 언제든지 환영합니다! (contact: [email protected])

We will upload a recorded video on personal youtube storage. please check the link below.

Objective:

Paper reading/discussion on VL models (not limited to md (medical domain); md -> gd (general domain) -> md -> gd ...)

Time:

Fri. 10:30 AM - 11:30 AM

Participants and presentation order:

(KAIST-Edlab, 2023-04-06 Joined) 종학, 현경, 성수

(KAIST-MLIlab, 2023-07-27 Joined) 한결

(KAIST-Edlab, 2024-06-08 Joined) 다은

Presentation order

종학 -> 현경 -> 성수 -> 한결 -> 다은

Paper-Review:

Date	Week	Presenter	Topic	Paper	Material	Link
2023.04.06	Week01	Jonghak	parametric model	BioViL-T	Slides
2023.04.13	Week02	Hyungyung	Consistency based MLM	EPIC	Slides
2023.04.20	Week03	Seongsu	Textual inversion on medical domain	Medical diffusion on a budget: textual inversion for medical image generation	Paper	-
2023.04.27	Week04	Jonghak	Zero convoluton	ControlNet	None
2023.05.04	Week05	Hyungyung	CXR Generation	Cheff	None
2023.05.11	Week06	Seongsu	PEFT, multi-modal	LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention, LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model	Paper1 Paper2	-
2023.05.18	Week07	Jonghak	Region-guided generation	(CVPR23) RGRG	Slides
2023.05.25	Week08	Hyungyung	Compositionality	MosaiCLIP	None
~~2023.06.01~~	~~None~~	~~None~~	~~None~~	~~None~~	~~None~~
~~2023.06.08~~	~~None~~	~~None~~	~~None~~	~~None~~	~~None~~
2023.06.15	Week9	Seongsu	Benchmark and evaluation	VisualGPTScore: Visio-Linguistic Reasoning with Multimodal Generative Pre-Training Scores	Paper	-
2023.06.22	Week10	Jonghak	Open-set detection in Genneral & Medical	Recent 6 papers (ViLD, GLIP/GLIP-v2, ...)	Slides
2023.06.29	Week11	Hyungyung	Machine World Learning Benchmark	MEWL: Few-shot multimodal word learning with referential uncertainty	None
2023.07.06	Week12	Seongsu	Evaluation on RRG	Evaluating Progress in Automatic Chest X-Ray Radiology Report Generation	Paper	-
2023.07.13	Week13	Jonghak	Openset detection with LLM	GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest	None
2023.07.20	Week14	Hyungyung	Attention & Retrieval based RRG	Reading Radiology Imaging Like The Radiologists	None
2023.07.27	Week15	Seongsu	RAG for RRG	Retrieval Augmented Chest X-Ray Report Generation using OpenAI GPT models	Paper	-
~~2023.08.03~~	~~None~~	~~None~~	~~None~~	~~None~~	~~None~~
2023.08.10	Week16	Jonghak	In-context learning in medical	MedFlamingo	None
2023.08.16	Week17	Hyungyung	Reasoning Segmentation with Large Multimodal Model	(CVPR 24) LISA & (ICCV 23) SAM	Slide
2023.08.24	Week18	Hangyul	Graph Consturction for Ophthalmologic Report Generation	(CVPR 22) Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation	Slides	Video
2023.08.31	Week19	Seongsu	IE benchmark on radiology reports	RadGraph2: Modeling Disease Progression in Radiology Reports via Hierarchical Information Extraction	Paper	-
2023.09.08	Week20	Jonghak
2023.09.15	Week21	Hyungyung	Anomaly detection + LLM	AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models	Slides
2023.09.22	Week22	Hangyul	Image Paragraph Captioning	(NeurIPS 22) Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning	Slides
2023.10.05	Week23	Seongsu	Exploiting LLMs as visual explainers	Learning Concise and Descriptive Attributes for Visual Recognition	Paper	-
2023.10.12	Week24	Jonghak	zero-shot VQA & GPT4 in radiograph	1. Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language 2. Exploring the Boundaries of GPT-4 in Radiology	paper1 paper2	Video
2023.10.19	Week25	Hyungyung	Refinement strategy for VLLM	Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models	Slides
2023.10.26	Week26	Hangyul	Segmentation w/o annotation using vision-language model	(CVPR 22) GroupViT: Semantic Segmentation Emerges from Text Supervision	Slides	Video
2023.11.02	Week27	Seongsu	InstructPix2Pix adaptable for sequential CXR exams	BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys	Paper	-
2023.11.09	Week28	Jonghak		World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models
2023.11.16	Week29	Hyungyung	Benchmark for VLLM	HALLUSIONBENCH: You See What You Think? Or You Think What You See?
2023.11.23	Week30	Hangyul	Model Customization w/ retrieval	(CVPR 23) Learning Customized Visual Models with Retrieval-Augmented Knowledge	Slides	Video
2023.11.30	Week31	Seongsu	Benchmark integartion, multi-task & multi-modal learning	Learning A Multi-Task Transformer Via Unified And Customized Instruction Tuning For Chest Radiograph Interpretation	Paper	-
2023.12.21	Week32	Jonghak		Image Captioners Are Scalable Vision Learners Too
2023.12.28	Week33	Hyungyung		See, Say, and Segment: Teaching LMMs to Overcome False Premises
2024.01.02	Week34	Hangyul	Masked Representation Learning in medical VL	(ICLR 23) Advancing Radiograph Representation Learning with Masked Record Modeling	Slides	Video
2024.01.11	Week35	Seongsu	Identifying and resolving artifact phenomena in feature maps of ViTs	Vision Transformers Need Registers	Paper	-
2024.01.18	Week36	Jonghak		A Vision Check-up for Language Models		-
2024.01.25	Week37	Hyungyung		Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models		-
2024.02.02	Week38	Hangyul	Multimodal CoT	Multimodal Chain-of-Thought Reasoning in Language Models	Slides	Video
2024.02.08	Week39	Seongsu	Vision Backbones for the Radiology Domain	RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision	Paper	-
2024.02.15	Week40	Jonghak				-
2024.02.22	Week41	Hyungyung	Chain-of-Reasoning with Question Generation	Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation	Slide	-
2024.03.07	Week42	Seongsu	Benchmark and Toolkit for Evaluating Medical Vision-Language Models	MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models	Paper	-
2024.03.22	Week43	Hangyul	CLIP-Based Zero-Shot Anomaly Detection	(ICLR 24) AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection	Slides	Video
2024.03.29	Week44	Jonghak		MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
2024.04.04	Week45	Hyungyung		Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters
2024.04.11	Week46	Seongsu	LLM-as-Judge in Radiology Report Generation	LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation	Paper	-
2024.04.18	Week47	Hangyul	LLM for Multimodal Learning of CXR	(ICLR 24) LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation	Slides	Video
2024.04.25	Week48	Jonghak		Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
2024.05.02	Week49	Hyungyung		BLINK : Multimodal Large Language Models Can See but Not Perceive
2024.05.09	Week50	Seongsu	LLM-as-Judge in Radiology Report Generation	GREEN: Generative Radiology Report Evaluation and Error Notation	Paper	-
2024.05.23	Week51	Hangyul	MiniGPT4 for CXR	(AAAI 24) Bootstrapping Large Language Models for Radiology Report Generation	Slides	Video
2024.05.30	Week52	Jonghak	Dense captioning	(CVPR 24) Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation	Slides
2024.06.13	Week53	Hyungyung		Why are Visually-Grounded Language Models Bad at Image Classification?
2024.06.20	Week54	Seongsu	Generation of Digitally Reconstructed Radiographs from CT images	Shadow and Light: Digitally Reconstructed Radiographs for Disease Classification	Paper	-
2024.06.27	Week55	Hangyul	Chatting for CXR	WoLF:Wide-scope Large Language Model Framework for CXR Understanding	Slides
2024.07.05	Week56	Daeun	Doctor LLM evaluation	Towards Automatic Evaluation for LLMs’ Clinical Capabilities: Metric, Data, and Algorithm	Slides
2024.07.11	Week57	Jonghak	Symbolic representation (RL)	Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding	Slides
2024.07.18	Week58	Hyungyung
2024.08.01	Week59	Seongsu	Encoder-free Vision-Language Model	Unveiling Encoder-Free Vision-Language Models	Paper	-
2024.08.08	Week60	Hangyul	Chatting-based image retrieval	(NeurIPS 23) Chatting Makes Perfect: Chat-based Image Retrieval	Slides
2024.08.22	Week61	Daeun	MLLMs	Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs	Slides
2024.08.29	Week62	Jonghak	Knowledge Graph for CXR	Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs	slides
2024.09.12	Week63	Hyungyung		Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning
2024.09.19	Week64	Seongsu		Law of Vision Representation in MLLMs
2024.09.27	Week65	Hangyul	Reasoning Segmentation with Large Multimodal Model	(CVPR 24) GSVA: Generalized Segmentation via Multimodal LLMs	Slides	Video
2024.10.03	Week66	Daeun	Multi-modal medical consultation	Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm	Slides
2024.10.18	Week67	Jonghak		(ECCV 24) HERGen: Elevating Radiology Report Generation with Longitudinal Data	Slides
2024.10.25	Week68	Hyungyung		CoVT-CXR: Building Chain of Visual Thought for Interpretable Chest X-Ray Diagnosis
2024.11.01	Week69	Seongsu
2024.11.08	Week70	Hangyul	Counterfactual learning for report geneneration	(ECCV 24) Contrastive Learning with Counterfactual Explanations for Radiology Report Generation	Slides	Video
2024.11.15	Week71	Daeun		Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data?
2024.11.22	Week72	Jonghak		(EMNLP 24) RaTEScore: A Metric for Radiology Report Generation	Slides
2024.11.29	Week73	Hyungyung
2024.12.06	Week74	Hangyul	Eye-gazed data incorporation for CXR pretraining	(NeurIPS 24) Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning	Slides	Video
2024.12.13	Week75	Seongsu

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CXR VL research group

Objective:

Time:

Participants and presentation order:

Presentation order

Paper-Review:

About

Releases

Packages

Contributors 5

License

KAIST-Edlab/Study_Of_VL

Folders and files

Latest commit

History

Repository files navigation

CXR VL research group

Objective:

Time:

Participants and presentation order:

Presentation order

Paper-Review:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Packages