We're a group of doctoral students from KAIST's AI Graduate School, and we're all about multi-modal (vision-language) research in the medical field. Our aim is to continuously expand our knowledge and experience beyond traditional boundaries by deeply analyzing the essence of AI and the unique characteristics of the medical domain.
Every Thursday, we get together to review papers on the multi-modal research conducted in both general and medical fields, actively exploring the endless possibilities of AI through analysis and discussion. If you're interested in our study, especially if you have a background in medical or AI fields, we'd love for you to join us and grow together. (contact: [email protected])
KAIST AI 대학원의 박사과정 학생으로 구성된 우리 그룹은 의료 분야의 멀티모달(시각-언어) 연구에 전념하고 있습니다. 인공지능의 본질과 의료 도메인의 특성을 깊이 연구하면서, 기존의 경계를 초월하여 우리의 지식과 경험을 지속적으로 확장하고자 합니다.
우리는 일반 분야와 의료 분야에서 진행되는 멀티모달 연구의 논문을 매주 선정하여 리뷰하며, 분석과 토론을 통해 인공지능의 끊임없는 가능성을 적극 탐구하고 있습니다. 우리 그룹에 참여해 함께 성장할 분은 언제든지 환영합니다! (contact: [email protected])
We will upload a recorded video on personal youtube storage. please check the link below.
Paper reading/discussion on VL models (not limited to md (medical domain); md -> gd (general domain) -> md -> gd ...)
Fri. 10:30 AM - 11:30 AM
(KAIST-Edlab, 2023-04-06 Joined) 종학, 현경, 성수
(KAIST-MLIlab, 2023-07-27 Joined) 한결
(KAIST-Edlab, 2024-06-08 Joined) 다은
종학 -> 현경 -> 성수 -> 한결 -> 다은
Date | Week | Presenter | Topic | Paper | Material | Link |
---|---|---|---|---|---|---|
2023.04.06 | Week01 | Jonghak | parametric model | BioViL-T | Slides | |
2023.04.13 | Week02 | Hyungyung | Consistency based MLM | EPIC | Slides | |
2023.04.20 | Week03 | Seongsu | Textual inversion on medical domain | Medical diffusion on a budget: textual inversion for medical image generation | Paper | - |
2023.04.27 | Week04 | Jonghak | Zero convoluton | ControlNet | None | |
2023.05.04 | Week05 | Hyungyung | CXR Generation | Cheff | None | |
2023.05.11 | Week06 | Seongsu | PEFT, multi-modal | LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention, LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model | Paper1 Paper2 | - |
2023.05.18 | Week07 | Jonghak | Region-guided generation | (CVPR23) RGRG | Slides | |
2023.05.25 | Week08 | Hyungyung | Compositionality | MosaiCLIP | None | |
2023.06.15 | Week9 | Seongsu | Benchmark and evaluation | VisualGPTScore: Visio-Linguistic Reasoning with Multimodal Generative Pre-Training Scores | Paper | - |
2023.06.22 | Week10 | Jonghak | Open-set detection in Genneral & Medical | Recent 6 papers (ViLD, GLIP/GLIP-v2, ...) | Slides | |
2023.06.29 | Week11 | Hyungyung | Machine World Learning Benchmark | MEWL: Few-shot multimodal word learning with referential uncertainty | None | |
2023.07.06 | Week12 | Seongsu | Evaluation on RRG | Evaluating Progress in Automatic Chest X-Ray Radiology Report Generation | Paper | - |
2023.07.13 | Week13 | Jonghak | Openset detection with LLM | GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest | None | |
2023.07.20 | Week14 | Hyungyung | Attention & Retrieval based RRG | Reading Radiology Imaging Like The Radiologists | None | |
2023.07.27 | Week15 | Seongsu | RAG for RRG | Retrieval Augmented Chest X-Ray Report Generation using OpenAI GPT models | Paper | - |
2023.08.10 | Week16 | Jonghak | In-context learning in medical | MedFlamingo | None | |
2023.08.16 | Week17 | Hyungyung | Reasoning Segmentation with Large Multimodal Model | (CVPR 24) LISA & (ICCV 23) SAM | Slide | |
2023.08.24 | Week18 | Hangyul | Graph Consturction for Ophthalmologic Report Generation | (CVPR 22) Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation | Slides | Video |
2023.08.31 | Week19 | Seongsu | IE benchmark on radiology reports | RadGraph2: Modeling Disease Progression in Radiology Reports via Hierarchical Information Extraction | Paper | - |
2023.09.08 | Week20 | Jonghak | ||||
2023.09.15 | Week21 | Hyungyung | Anomaly detection + LLM | AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models | Slides | |
2023.09.22 | Week22 | Hangyul | Image Paragraph Captioning | (NeurIPS 22) Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning | Slides | |
2023.10.05 | Week23 | Seongsu | Exploiting LLMs as visual explainers | Learning Concise and Descriptive Attributes for Visual Recognition | Paper | - |
2023.10.12 | Week24 | Jonghak | zero-shot VQA & GPT4 in radiograph | 1. Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language 2. Exploring the Boundaries of GPT-4 in Radiology | paper1 paper2 | Video |
2023.10.19 | Week25 | Hyungyung | Refinement strategy for VLLM | Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models | Slides | |
2023.10.26 | Week26 | Hangyul | Segmentation w/o annotation using vision-language model | (CVPR 22) GroupViT: Semantic Segmentation Emerges from Text Supervision | Slides | Video |
2023.11.02 | Week27 | Seongsu | InstructPix2Pix adaptable for sequential CXR exams | BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys | Paper | - |
2023.11.09 | Week28 | Jonghak | World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models | |||
2023.11.16 | Week29 | Hyungyung | Benchmark for VLLM | HALLUSIONBENCH: You See What You Think? Or You Think What You See? | ||
2023.11.23 | Week30 | Hangyul | Model Customization w/ retrieval | (CVPR 23) Learning Customized Visual Models with Retrieval-Augmented Knowledge | Slides | Video |
2023.11.30 | Week31 | Seongsu | Benchmark integartion, multi-task & multi-modal learning | Learning A Multi-Task Transformer Via Unified And Customized Instruction Tuning For Chest Radiograph Interpretation | Paper | - |
2023.12.21 | Week32 | Jonghak | Image Captioners Are Scalable Vision Learners Too | |||
2023.12.28 | Week33 | Hyungyung | See, Say, and Segment: Teaching LMMs to Overcome False Premises | |||
2024.01.02 | Week34 | Hangyul | Masked Representation Learning in medical VL | (ICLR 23) Advancing Radiograph Representation Learning with Masked Record Modeling | Slides | Video |
2024.01.11 | Week35 | Seongsu | Identifying and resolving artifact phenomena in feature maps of ViTs | Vision Transformers Need Registers | Paper | - |
2024.01.18 | Week36 | Jonghak | A Vision Check-up for Language Models | - | ||
2024.01.25 | Week37 | Hyungyung | Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models | - | ||
2024.02.02 | Week38 | Hangyul | Multimodal CoT | Multimodal Chain-of-Thought Reasoning in Language Models | Slides | Video |
2024.02.08 | Week39 | Seongsu | Vision Backbones for the Radiology Domain | RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision | Paper | - |
2024.02.15 | Week40 | Jonghak | - | |||
2024.02.22 | Week41 | Hyungyung | Chain-of-Reasoning with Question Generation | Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation | Slide | - |
2024.03.07 | Week42 | Seongsu | Benchmark and Toolkit for Evaluating Medical Vision-Language Models | MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models | Paper | - |
2024.03.22 | Week43 | Hangyul | CLIP-Based Zero-Shot Anomaly Detection | (ICLR 24) AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection | Slides | Video |
2024.03.29 | Week44 | Jonghak | MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training | |||
2024.04.04 | Week45 | Hyungyung | Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters | |||
2024.04.11 | Week46 | Seongsu | LLM-as-Judge in Radiology Report Generation | LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation | Paper | - |
2024.04.18 | Week47 | Hangyul | LLM for Multimodal Learning of CXR | (ICLR 24) LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation | Slides | Video |
2024.04.25 | Week48 | Jonghak | Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine | |||
2024.05.02 | Week49 | Hyungyung | BLINK : Multimodal Large Language Models Can See but Not Perceive | |||
2024.05.09 | Week50 | Seongsu | LLM-as-Judge in Radiology Report Generation | GREEN: Generative Radiology Report Evaluation and Error Notation | Paper | - |
2024.05.23 | Week51 | Hangyul | MiniGPT4 for CXR | (AAAI 24) Bootstrapping Large Language Models for Radiology Report Generation | Slides | Video |
2024.05.30 | Week52 | Jonghak | Dense captioning | (CVPR 24) Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | Slides | |
2024.06.13 | Week53 | Hyungyung | Why are Visually-Grounded Language Models Bad at Image Classification? | |||
2024.06.20 | Week54 | Seongsu | Generation of Digitally Reconstructed Radiographs from CT images | Shadow and Light: Digitally Reconstructed Radiographs for Disease Classification | Paper | - |
2024.06.27 | Week55 | Hangyul | Chatting for CXR | WoLF:Wide-scope Large Language Model Framework for CXR Understanding | Slides | |
2024.07.05 | Week56 | Daeun | Doctor LLM evaluation | Towards Automatic Evaluation for LLMs’ Clinical Capabilities: Metric, Data, and Algorithm | Slides | |
2024.07.11 | Week57 | Jonghak | Symbolic representation (RL) | Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding | Slides | |
2024.07.18 | Week58 | Hyungyung | ||||
2024.08.01 | Week59 | Seongsu | Encoder-free Vision-Language Model | Unveiling Encoder-Free Vision-Language Models | Paper | - |
2024.08.08 | Week60 | Hangyul | Chatting-based image retrieval | (NeurIPS 23) Chatting Makes Perfect: Chat-based Image Retrieval | Slides | |
2024.08.22 | Week61 | Daeun | MLLMs | Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs | Slides | |
2024.08.29 | Week62 | Jonghak | Knowledge Graph for CXR | Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs | slides | |
2024.09.12 | Week63 | Hyungyung | Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning | |||
2024.09.19 | Week64 | Seongsu | Law of Vision Representation in MLLMs | |||
2024.09.27 | Week65 | Hangyul | Reasoning Segmentation with Large Multimodal Model | (CVPR 24) GSVA: Generalized Segmentation via Multimodal LLMs | Slides | Video |
2024.10.03 | Week66 | Daeun | Multi-modal medical consultation | Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm | Slides | |
2024.10.18 | Week67 | Jonghak | (ECCV 24) HERGen: Elevating Radiology Report Generation with Longitudinal Data | Slides | ||
2024.10.25 | Week68 | Hyungyung | CoVT-CXR: Building Chain of Visual Thought for Interpretable Chest X-Ray Diagnosis | |||
2024.11.01 | Week69 | Seongsu | ||||
2024.11.08 | Week70 | Hangyul | Counterfactual learning for report geneneration | (ECCV 24) Contrastive Learning with Counterfactual Explanations for Radiology Report Generation | Slides | Video |
2024.11.15 | Week71 | Daeun | Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data? | |||
2024.11.22 | Week72 | Jonghak | (EMNLP 24) RaTEScore: A Metric for Radiology Report Generation | Slides | ||
2024.11.29 | Week73 | Hyungyung | ||||
2024.12.06 | Week74 | Hangyul | Eye-gazed data incorporation for CXR pretraining | (NeurIPS 24) Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning | Slides | Video |
2024.12.13 | Week75 | Seongsu |