Name	Name	Last commit message	Last commit date
Latest commit History 20 Commits
readme.md	readme.md
vlm.md	vlm.md

Awesome LLM Unlearning

Description

A collection of papers and resources about Machine Unlearning on LLMs.

Another collection of Vision Language Models and Vision Generative models can be found here.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities across various tasks, but their training typically requires vast amounts of data, raising concerns in legal and ethical domains. Issues such as potential copyright disputes, data authenticity, and privacy concerns have been brought to the forefront. Machine unlearning offers a potential solution to these challenges, even though it presents new hurdles when applied to LLMs. In this repository, we aim to collect and organize surveys, datasets, approaches, and evaluation metrics pertaining to machine unlearning on LLMs, with the hope of providing valuable insights for researchers in this field.

Survey

Paper Title	Venue	Year
Knowledge unlearning for LLMs: Tasks, methods, and challenges	ArXiv	2023.11
Machine Unlearning of Pre-trained Large Language Models	ArXiv	2024.02
Rethinking Machine Unlearning for Large Language Models	ArXiv	2024.02
Machine Unlearning: Taxonomy, Metrics, Applications, Challenges, and Prospects	ArXiv	2024.03
The Frontier of Data Erasure: Machine Unlearning for Large Language Models	ArXiv	2024.03

Regulations

Title	Key Words	Year
The EU general data protection regulation (GDPR)	GDPR	2017
Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence		2023
Scalable Extraction of Training Data from (Production) Language Models	Privacy Concerns	2023

Methods

Model-based Methods

Gradient-ascend and its variants

Paper Title	Author	Paper with code	Key words	Venue	Time
Composing Parameter-Efficient Modules with Arithmetic Operations	Zhang et al.	Github	use LoRA to create task vectors and accomplish unlearning by negating tasks under these task vectors.	NeurIPS 2023	2023-06
Knowledge Unlearning for Mitigating Privacy Risks in Language Models	Jang et al.	Github	updating the model parameters by maximizing the likelihood of mis-prediction for the samples within the forget set $D_f$	ACL 2023	2023-07
Unlearning Bias in Language Models by Partitioning Gradients	Yu et al.	Github	aims to minimize the likelihood of predictions on relabeled forgetting data	ACL 2023	2023-07
Who’s Harry Potter? Approximate Unlearning in LLMs	Eldan et al.	HuggingFace	descent-based fine-tuning, over relabeled or randomly labeled forgetting data, where generic translations are used to replace the unlearned texts.	ICLR 2024	2023-10
Unlearn What You Want to Forget: Efficient Unlearning for LLMs	Chen and Yang	Github	fine-tune an adapter over the unlearning objective that acts as an unlearning layer within the LLM.	EMNLP	2023-12
Machine Unlearning of Pre-trained Large Language Models	Yao et al.	Github	incorporate random labeling to augment the unlearning objective and ensure utility preservation on the retain set $D_r$	ArXiv	2024-02

Localization-informed unlearning

Paper Title	Author	Paper with code	Key words	Venue	Time
Locating and Editing Factual Associations in GPT	Meng et al.	Github	the process of localization can be accomplished through representation denoising, also known as causal tracing, focusing on the unit of model layers.	ArXiv	2022-02
Unlearning Bias in Language Models by Partitioning Gradients	Yu et al.	Github	gradient-based saliency is employed to identify the crucial weights that need to be fine-tuned to achieve the unlearning objective.	ACL 2023	2023-07
DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models	Wu et al.	Github	neurons that respond to unlearning targets are identified within the feed-forward network and subsequently selected for knowledge unlearning.	EMNLP 2023	2023-10
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks	Patil et al.	Github	it is important to delete information about unlearning targets wherever it is represented in models in order to protect against attacks	ArXiv	2023-09

Influence function-based method

Paper Title	Author	Paper with code	Key words	Venue	Time
Studying Large Language Model Generalization with Influence Functions	Grosse et al.		the potential of influence functions in LLM unlearning may be underestimated, given that scalability issue, and approximation errors can be mitigated by focusing on localized weights that are salient to unlearning.	ArXiv	2023-08

Other model-based method

Paper Title	Author	Paper with code	Key words	Venue	Time
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks	Hase et al.	Github	Defending attacks.	ICLR 2024	2023-09
Learning and Forgetting Unsafe Examples in Large Language Models	Zhao et al.		Fine-tuning based.	ArXiv	2023-12
Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models	Gu et al.		sequential editing of LLMs may compromise their general capabilities.	ArXiv	2024-03
Towards Efficient and Effective Unlearning of Large Language Models for Recommendation	Wang et al.	Github	Using LLM Unlearning in Recommendation	ArXiv	2024-03
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning	Li et al.	They control the model towards having a novice-like level of hazardous knowledge, designed a loss function with a forget loss an a retrain loss. The forget loss bends the model representations towards those of a noive, while the retain loss limits the amount of general capabilities removed.	Homepage	ArXiv	2024-03

Data-based Methods

Input-based method

Paper Title	Author	Paper with code	Key words	Venue	Time
Memory-assisted prompt editing to improve gpt-3 after deployment	Madaan et al.	Github	have also shown promise in addressing the challenges posed by the restricted access to black-box LLMs and achieving parameter efficiency of LLM unlearning.	EMNLP 2022	2022-01
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations	Achintalwar et al.	No Code Available	aligning a company's internal-facing enterprise chatbot to its business conduct guidelines	ArXiv	2024-03
Large Language Model Unlearning via Embedding-Corrupted Prompts	Chris Yuhao Liu et al.	No Code Available	enforce an unlearned state during inference by employing a prompt classifier to identify and safeguard prompts to forget.	ArXiv	2024-06

Output-based method

Paper Title	Author	Paper with code	Key words	Venue	Time
Offset Unlearning for Large Language Models	James Y.Huang et al.		propose $\delta$-Unlearning, an offset unlearning framework for black-box LLMs. Instead of tuning the black-box LLM itself, $\delta$-Unlearning learns the logit offset needed for unlearning by contrasting the logits from a pair of smaller models	ArXiv	2024-04
Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference	Jiabao Ji et al.		introduce an assistant LLM that aims to achieve the opposite of the unlearning goals, and then derives the unlearned LLM by computing the logit difference between the target and the assistant LLMs.	ArXiv	2024-06

Evaluation

Attacking and Defending

Paper Title	Author	Paper with code	Key words	Venue	Year
Can Sensitive Information be Deleted from LLMS? Objectives for Defending Against Extraction Attacks	Patil et al.	Github		ArXiv	2023.09
Detecting Pretraining Data from Large Language Models	Shi et al.	Github	pretrain data detection	ArXiv	2023.10
Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration	Fu et al.	[No Code Available]	finetune data detection	ArXiv	2023.11
Tensor trust: Interpretable prompt injection attacks from an online game	Toyer et al.	Github	input-based methods may not necessarily yield genuinely unlearned models, leading to weaker unlearning strategies compared to model-based methods because modifying the inputs of LLMs alone may not be sufficient to completely erase the influence of unlearning targets	ArXiv	2023-11

Benchmarks & Datasets

Unlearning Specified

Paper Title	Author	Paper with code	Key words	Venue	Year
TOFU: A Task of Fictitious Unlearning for LLMs	Maini et al.	Homepage		ArXiv	2024.01
Machine Unlearning of Pre-trained Large Language Models	Yao et al.	Github		ArXiv	2024.02
Eight Methods to Evaluate Robust Unlearning in LLMs	Lynch et al.			ArXiv	2024.02
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning	Li et al.	Homepage	Biology, Cyber and Chemical	ArXiv	2024.03

Unlearning Non-Specified

Name	Description	Used By
BBQ (Bias Benchmark for QA)	a dataset of question sets constructed by the authors that highlight attested social biases against people belonging to protected classes along nine social dimensions relevant for U.S. English-speaking contexts.	Zhao et al.,
HarmfulQA	a ChatGPT-distilled dataset constructed using the Chain of Utterances (CoU) prompt.	Zhao et al.
CategoricalHarmfulQA	Thus, the dataset consists of 550 harmful questions, 55 such questions are shown in the table.	Bhardwaj et al.
Pile	an 825 GiB English text corpus targeted at training large-scale language models.	Zhao et al.
Detoxify	Detoxify is a simple, easy to use, python library to detect hateful or offensive language. It was built to help researchers and practitioners identify potential toxic comments.	Zhao et al.
Enron Email Dataset		Wu et al.
Training Data Extraction Challenge		Jang et al.,
Harry Potter book series dataset		Eldan et al., Shi et al.
Real Toxicity Prompts		Lu et al., Liu et al.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome LLM Unlearning

Description

Abstract

Survey

Regulations

Methods

Model-based Methods

Gradient-ascend and its variants

Localization-informed unlearning

Influence function-based method

Other model-based method

Data-based Methods

Input-based method

Output-based method

Evaluation

Attacking and Defending

Benchmarks & Datasets

Unlearning Specified

Unlearning Non-Specified

About

Releases

Packages

Contributors 3

Carol-gutianle/Awesome-llm-unlearning

Folders and files

Latest commit

History

Repository files navigation

Awesome LLM Unlearning

Description

Abstract

Survey

Regulations

Methods

Model-based Methods

Gradient-ascend and its variants

Localization-informed unlearning

Influence function-based method

Other model-based method

Data-based Methods

Input-based method

Output-based method

Evaluation

Attacking and Defending

Benchmarks & Datasets

Unlearning Specified

Unlearning Non-Specified

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages