Skip to content

Commit

Permalink
Add recent talks
Browse files Browse the repository at this point in the history
kevinwjin committed Nov 6, 2024
1 parent e0f0a82 commit 4e6128c
Showing 6 changed files with 83 additions and 0 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
33 changes: 33 additions & 0 deletions content/nlp-llm-ig/nlp-llm-ig-240919.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
Title: Seventeenth Meeting of the Yale NLP/LLM Interest Group
Category: nlp-llm-ig
Date: 2024-09-19
Slug: seventeenth-nlp-llm-ig
Tags: NLP,LLM
Summary: "Recent work on improving interpretability of large language models" by Kevin Jin

**Speaker**: Kevin Jin, Ph.D. Student in Computational Biology and Biomedical Informatics at Yale University

**Title of Talk**: Recent work on improving interpretability of large language models

**When**: Thursday, September 19, 4:30pm-5:30pm

**Location**: 100 College Street, 11th Floor, Workshop 1167

**Recording Link**: <https://www.youtube.com/watch?v=SA9NbWeHbQs>

### Talk summary:

Rarely are the inner workings of one’s computer or automobile considered by most, since utility is prized over transparency. Large language models (LLMs) have defined the frontier of artificial intelligence and vaulted it into public consciousness, with their impressive capabilities inspiring myriad applications. But much of the reasoning behind their behavior, both beneficial and harmful, remains unknown. Uncovering the black box of LLMs is a crucial pursuit that can help mitigate, build trust, ensure safety and compliance, and drive further innovation. This talk will motivate interpretability of LLMs, place the ongoing push for interpretability in historical context, and cover three recent papers that advance our mechanistic understanding of LLMs.

### Speaker bio:

Kevin Jin is a second-year PhD student in the interdepartmental program in Computational Biology and Biomedical Informatics at Yale University. He is a member of the Clinical NLP Lab, a group in the Department of Biomedical Informatics and Data Science at the Yale School of Medicine, and advised by Hua Xu. His research interests are diversely distributed but concentrate on digital psychiatry: characterizing mental health disorders with natural language processing and wearable biosensors. Kevin completed his undergraduate work at Johns Hopkins University, receiving a B.S. in Molecular and Cellular Biology in 2020, and subsequently underwent a career transition from pre-med to biomedical informatics. Outside of the lab, he loves reading news, studying languages, baking desserts, and sparring at the Yale Kendo Club.

<img style="width: 85%;" src="../image/news/20240919-nlp-llm-ig-kevin_2.JPG">
<img style="width: 85%;" src="../image/news/20240919-nlp-llm-ig-kevin_3.JPG">
<img style="width: 85%;" src="../image/news/20240919-nlp-llm-ig-kevin_1.JPG">

### Get Involved!

We invite all members to actively participate in the activities of the Yale NLP/LLM Interest Group. Whether you're a seasoned NLP practitioner or just starting to explore the field, there's a place for you in our community. Stay tuned for updates on upcoming events and initiatives!
[**Join our mailing list**](https://mailman.yale.edu/mailman/listinfo/nlp-llm-ig) to stay informed about future meetings and events.
25 changes: 25 additions & 0 deletions content/nlp-llm-ig/nlp-llm-ig-241002.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
Title: Nineteenth Meeting of the Yale NLP/LLM Interest Group
Category: nlp-llm-ig
Date: 2024-10-03
Slug: nineteenth-nlp-llm-ig
Tags: NLP,LLM
Summary: "Combining Rule-based NLP-lite with Rapid Iterative Chart Adjudication for Creation of a Large Gold Standard Cohort from EHR data for a Clinical Trial Emulation" by Dr. Pradeep Mutalik

**Speaker**: Pradeep Mutalik, MD, Associate Research Scientist in Biomedical Informatics and Data Science at Yale University

**Title of Talk**: Combining Rule-based NLP-lite with Rapid Iterative Chart Adjudication for Creation of a Large Gold Standard Cohort from EHR data for a Clinical Trial Emulation

**When**: Wednesday, October 3, 4:30pm-5:30pm

**Location**: 100 College Street, 11th Floor, Workshop 1167

**Recording Link**: <https://www.youtube.com/watch?v=jwrcviGdOdk>

### Speaker bio:

The aim of this work was to create a gold-standard curated cohort of 10,000+ cases from the Veteran Affairs (VA) corporate data warehouse (CDW) for virtual emulation of a randomized clinical trial (CSP#592). The trial had six inclusion/exclusion criteria lacking adequate structured data. We therefore used a hybrid computer/human approach to extract information from clinical notes. Rulebased NLP output was iteratively adjudicated by a panel of trained non-clinician content experts and non-experts using an easy-to-use spreadsheetbased rapid adjudication display. This groupadjudication process iteratively sharpened both the computer algorithm and clinical decision criteria, while simultaneously training the non-experts. The cohort was successfully created with each inclusion/exclusion decision backed by a source document. Less than 0.5% of cases required referral to specialist clinicians. It is likely that such curated datasets capturing specialist reasoning and using a process-supervised approach will acquire greater importance as training tools for future clinical AI applications.

### Get Involved!

We invite all members to actively participate in the activities of the Yale NLP/LLM Interest Group. Whether you're a seasoned NLP practitioner or just starting to explore the field, there's a place for you in our community. Stay tuned for updates on upcoming events and initiatives!
[**Join our mailing list**](https://mailman.yale.edu/mailman/listinfo/nlp-llm-ig) to stay informed about future meetings and events.
25 changes: 25 additions & 0 deletions content/nlp-llm-ig/nlp-llm-ig-241031.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
Title: Twentieth Meeting of the Yale NLP/LLM Interest Group
Category: nlp-llm-ig
Date: 2024-10-31
Slug: twentieth-nlp-llm-ig
Tags: NLP,LLM
Summary: "Exploring Memorization in Medical Large Language Models" by Dr. Anran Li

**Speaker**: Anran Li, PhD, Postdoctoral Fellow in Biomedical Informatics and Data Science at Yale University

**Title of Talk**: Exploring Memorization in Medical Large Language Models

**When**: Wednesday, October 31, 4:30pm-5:30pm

**Location**: 100 College Street, 11th Floor, Workshop 1167

**Recording Link**: <https://www.youtube.com/watch?v=ezm5GTQNDAY>

### Speaker bio:

Medical large language models (MedLLMs) have demonstrated advanced capabilities across various medical tasks, e.g., medical question answering and text summarization. Despite the remarkable advancements, MedLLMs are confronted with several privacy and copyright challenges. For example, they memorize and regenerate large amounts of segments of their training data instances at inference time, which potentially leak patients' protected health information (PHI) to third parties. These concerns emphasize the importance of exploring the capability of MedLLMs to memorize their training data, especially given the fact that the medical training data are much sensitive and is copyrighted. In this project, we demonstrate that MedLLMs memorize and leak individual training samples.We propose an effective method for extracting memorized contents and conducted a comprehensive evaluation of memorization of medical LLMs. We aim to achieve the following objectives in this project. First, we aim to systematically investigate and quantify the memorization in large medical models, highlighting the potential risks of memorization in medical models. Second, we aim to analyze factors that influence memorization of MedLLMs. Third, we aim to explore whether we can identify copyrighted data because of memorization. Finally, we aim to mitigate memorization issues in the model development and release process.

### Get Involved!

We invite all members to actively participate in the activities of the Yale NLP/LLM Interest Group. Whether you're a seasoned NLP practitioner or just starting to explore the field, there's a place for you in our community. Stay tuned for updates on upcoming events and initiatives!
[**Join our mailing list**](https://mailman.yale.edu/mailman/listinfo/nlp-llm-ig) to stay informed about future meetings and events.

0 comments on commit 4e6128c

Please sign in to comment.