LLMs have taken the world by storm, showing outstanding capabilities in several NLP-related domains. They have been proven to have astonishing emergent capabilities and unfortunately it has become painfully obvious that memorization is one of them. While this is not a problem for models dealing with public data, when the task at hand requires to deal with sensitive data this issue cannot be overlooked. This is why, spurring from our research survey, we present here a curated list of papers on the subjects of LLMs data memorization, the privacy attacks that this allows and potential solutions, including data anonymization, Differential Privacy and Machine Unlearning.
- Awesome-Privacy-Preserving-LLMs
- Data Extraction
- Membership Inference Attacks
- Model Inversion
- Re-Identification from Anonymized Data
- Attacks against Synthetic Data Generators
- Data anonymization
- Data anonymization with Differential Privacy
- Pre-training with Differential Privacy
- Fine-tuning with Differential Privacy
- Parameter-Efficient Fine-Tuning with Differential Privacy
- Reinforcement Learning with Differential Privacy
- Inference with Differential Privacy
- Federated Learning with Differential Privacy
- Machine Unlearning
- Tools and Frameworks
- Quantifying Memorization Across Neural Language Models Shows that it is possible to reconstruct training data with black-box access.
- Extracting Training Data from Large Language Models Query GPT-2 to extract training data.
- Are Large Pre-Trained Language Models Leaking Your Personal Information? Query LMs for email addresses and names, finding that the models are prone to leaking.
- Scalable extraction of training data from (production) language models Studies data extraction without prior knowledge about the dataset.
- DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models Explore the leakage of training data of GPT models.
- Analyzing Leakage of Personally Identifiable Information in Language Models Propose to solve a masked language modeling task to reconstruct masked personal information from a sentence.
- Dataset reconstruction attack against language models Reconstruct the training data of a finetuned GPT-2.
- Scalable extraction of training data from (production) language models Querying open-source models they were able to verify the success of Carlini's attack procedure by accessing to the training data only to verify the attack.
- Quantifying Association Capabilities of Large Language Models and Its Implications on Privacy Leakage Shows that it is possible to recover 3% of the training emails from a 20 billion parameter model with attacks that require association.
- ETHICIST: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation Propose an attack to recover a certain suffix given a precise prefix, known to be in the pre-training data.
- Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning Similar approach to ETHICIST but based on soft propmpt tuning.
- ProPILE: Probing Privacy Leakage in Large Language Models Tool designed to test models with black-box and white-box attacks against their tendency to release PII.
- Ignore Previous Prompt: Attack Techniques For Language Models The original goal of a prompt is changed with malicious text.
- "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models Characterize a large set of jailbreak prompts and evaluate their effectiveness against different LLMs.
- ChatGPT_DAN ChatGPT jailbreak that makes it behave like a fictional assistant.
- Multi-step Jailbreaking Privacy Attacks on ChatGPT Propose a multi-step jailbreak prompt to extract personal information, based on Chain-of-Thought.
- [Membership inference attack susceptibility of clinical language models](https://arxiv.org/abs/2211.09527 Conduct attacks against BERT and GPT-2 trained on clinical data and show that DP helps mitigate privacy leakage.
- Auditing Data Provenance in Text-Generation Models Attack exploits the tendency of LMs to rank rare words higher when they are in the same context in which they were seen during training.
- Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System? Train an attack classifier only on features that can be extracted from the output sequences.
- Membership inference attacks from first principles Train numerous shadow GPT-2 models to measure the probability of observing a certain likelihood of an example in models trained and not trained on it.
- Detecting Pretraining Data from Large Language Models Design attacks that threshold the average log likelihood of the top k rarest words to ascertain whether or not the example is part of the training data.
- Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks Demonstrate the effectiveness of MIA designed against LMs trained with Masked Language Modeling objectives.
- Membership Inference Attacks against Language Models via Neighbourhood Comparison Use a sentence and its perturbed version and propose that the model should act similarly with both if the sentence is not in the training set.
- On the privacy risk of in-context learning Show that LLMs, are vulnerable to MIAs that target datasets used during prompt training.
- Privacy Risks of General-Purpose Language Models Design the first model inversion attack against Transformer-based Language Models.
- Information Leakage in Embedding Models Attack model is a NN trained with multiset prediction loss so that it is possible for the model to predict each word in the embedding also conditioned on the words already predicted.
- Sentence Embedding Leaks More Information than You Expect: Generative Embedding Inversion Attack to Recover the Whole Sentence Implement the inversion as a generative LM task with a decoder-only model conditioned on the embedding of the sentence to invert as the first token representation.
- Text Embeddings Reveal (Almost) As Much As Text Propose an iterative method to reconstruct the input text of models that produce embeddings for documents generating iteratively different hyphothesis that may justify the observed embedding.
- Privacy Leakage in Text Classification A Data Extraction Approach Inject canaries in the training data and then reconstruct a partially masked sentence to search for tokens that maximize the probability of the target label.
- Canary Extraction in Natural Language Understanding Models Very similar to the previous work but uses a different reconstruction method.
- Text Revealer: Private Text Reconstruction via Model Inversion Attacks against Transformers Create a dataset mimicking unknown training data, train a model on it, and then adjust it by perturbing word embeddings to reduce classification loss on a target model.
- Deep Leakage from Gradients Demonstrates that if the gradients are openly accessible, it is possible to reconstruct the training data.
- TAG: Gradient Attack on Transformer-based Language Models Recovers up to 50% of the original tokens attacking BERT.
- LAMP: Extracting Text from Gradients with Language Model Priors Simultaneously train the attack model to minimize the difference between reconstruction gradients and choosing at each iteration only sequences that have low perplexity according to an external LM.
- Recovering Private Text in Federated Learning of Language Models Recover from the gradients a bag of words for the sentence to extract and then perform beam search to effectively reconstruct the sentence.
- Robust de-anonymization of large sparse datasets Show that an attacker can use background knowledge or external data to reconstruct the identity of a user in a sparse dataset describing users preferences or transactions.
- Estimating the success of re-identifications in incomplete datasets using generative models Proposes a method for estimating the probability that an individual has been successfully identified.
- Clinical Text Anonymization, its Influence on Downstream NLP Tasks and the Risk of Re-Identification Re-ID patiens from their anonymized history.
- Synthetic Data -- Anonymisation Groundhog Day Empirically show that synthetic data does not provide a better tradeoff between privacy and utility than traditional anonymisation techniques.
- TAPAS: A toolbox for adversarial privacy auditing of synthetic data Present a toolbox for performing attacks against synthetic data generators.
- Achilles' Heels: Vulnerable Record Identification in Synthetic Data Publishing Identify vulnerable records in the synthetic dataset.
- Guaranteeing anonymity when sharing medical data, the Datafly System - Foundational paper for k-anonymity
- Automated anonymization of text documents - Modular anonymization system for text documents.
- Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study - Examines methods for text perturbation (like microaggregation or data swapping).
- DataSifterText: Partially Synthetic Text Generation for Sensitive Clinical Notes - Use BERT to impute previously masked sensitive info.
- Natural Text Anonymization Using Universal Transformer with a Self-attention - Anonymization system that uses a universal transformer model to generate anonymized text.
- Deep Reinforcement Learning-based Text Anonymization against Private-Attribute Inference - Text anonymization system that manipulates the embeddings with an RL-based privacy-preserver.
- Recovering from Privacy-Preserving Masking with Large Language Models - Use an LLM to impute previously masked tokens.
- Hide and Seek (HaS): A Lightweight Framework for Prompt Privacy Protection - System using two local LLMs for anonymization and de-anonymization and in the middle use a black-box LLM.
- Broadening the Scope of Differential Privacy Using Metrics - Foundational paper for Metric Differential Privacy
- ADePT: Auto-encoder based Differentially Private Text Transformation - Auto-encoder based DP algorithm to anonymize text while retaining utility.
- When differential privacy meets NLP: The devil is in the detail - Formal analysis of ADePT, highlights some issues with privacy guarantees.
- DP-VAE: Human-Readable Text Anonymization for Online Reviews with Differentially Private Variational Autoencoders - End-to-end DP-VAE for text anonymization.
- DP-BART for Privatized Text Rewriting under Local Differential Privacy - Text privatization model based on BART.
- Sanitizing Sentence Embeddings (and Labels) for Local Differential Privacy - Sanitization system based on Purkayastha Mechanism.
- Differential Privacy for Text Analytics via Natural Text Sanitization - They propose SANTEXT to replace sensitive tokens.
- A Customized Text Sanitization Mechanism with Differential Privacy - They propose CUSTEXT to replace sensitive tokens.
- InferDPT: Privacy-Preserving Inference for Black-box Large Language Model - They propose RANTEXT to replace sensitive tokens.
- Leveraging Hierarchical Representations for Preserving Privacy and Utility in Text - New noise distribution specifically devised for Metric DP.
- A Differentially Private Text Perturbation Method Using a Regularized Mahalanobis Metric - Regularized Mahalanobis Metric for text perturbation.
- On a Utilitarian Approach to Privacy Preserving Text Generation - Based on Vickrey auction, they balance the choice beetween first and second neighbours using a tuning parameter.
- Guiding Text-to-Text Privatization by Syntax - Includes grammatical categories into the privatization process to preserve syntax.
- The Limits of Word Level Differential Privacy - Paraphrasing model obtained by fine-tuning GPT2.
- Learning and Evaluating a Differentially Private Pre-trained Language Model Fully private pre-training of BERT.
- Learning and Evaluating a Differentially Private Pre-trained Language Model Fully private pre-training of BERT for the legal domain.
- Differentially Private Language Models Benefit from Public Pre-training Comparison between fully private training and public pre-training with public fine-tuning for GPT-2.
- Why Is Public Pretraining Necessary for Private Model Training? Focused on the theoretical reasons why public pre-training is necessary for private learning.
- https://arxiv.org/abs/2302.09483 Select pre-training data based on the fine-tuning data distribution, creating smaller pre-training datasets for smaller models.
- Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe Private fine-tuning of GPT-2.
- Making the Shoe Fit: Architectures, Initializations, and Tuning for Learning with Privacy Propose different architectures, initializations and hyperparameter tuning methods explicitly devised for private learning.
- [Simple Baselines Are Strong Performers for Differentially Private Natural Language Processing] (https://openreview.net/forum?id=oOiSJEr2-Tt) Ghost Clipping introduced to save memory and make private-learning almost on par with non-private-learning from a memory usage point of view.
- Differentially Private Optimization on Large Model at Small Cost New Book-Keeping technique that requires a single backpropagation pass.
- Differentially Private Language Models for Secure Data Sharing DP-tuning of GPT-2 to generate a synthetic and private version of the tuning dataset.
- EW-Tune: A Framework for Privately Fine-Tuning Large Language Models with Differential Privacy Decrease the induced noise significantly by using Edgeworth accountant and (realistically) assuming that tuning epochs are not many.
- Differentially Private Fine-tuning of Language Models DP-PEFT of both RoBERTa and GPT-2 with several techniques.
- Large Language Models Can Be Strong Differentially Private Learners DP-PEFT of both RoBERTa and GPT-2 with several techniques.
- Privacy-Preserving Prompt Tuning for Large Language Model Services DP-prompt-tuning framework (RAPT).
- Privacy-Preserving In-Context Learning for Large Language Models DP-ICL based on partitioning the dataset, use it to get ICL examples, then aggregate the partitioned answers with noise to get a final answer.
- Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation DP-ICL based on partitioning the dataset, use it to get ICL examples, then aggregate the partitioned answers with noise to get a synthetic ICL example.
- Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models Prompt-PATE is used here, where an ensemble of teachers get private samples to give private answers that are later noisily aggregated into a synthetic (private) example.
- DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer DP-Offsite Prompt Tuning uses an ensemble of local models to get private predictions and then aggregate them with noise.
- Split-and-Denoise: Protect large language model inference with local differential privacy Local encoder and decoder to add noise to the input and remove noise from the output of an offsite LLM.
- InferDPT: Privacy-Preserving Inference for Black-box Large Language Model InferDPT has local anonymizer and de-anonymizer to anonymize the input and de-anonymize the output of an offsite LLM.
- Training Production Language Models without Memorizing User Data Next word prediction model trained in a federated fashion with DP-FedAVG
- Can Public Large Language Models Help Private Cross-device Federated Learning? Introduce DP-FTRL and use LLMs to improve privacy/utility tradeoff of the local LM in DP-FL.
- Federated Learning of Gboard Language Models with Differential Privacy Present and analyze twenty Gboard LMs trained for Next Word Prediction with DP-FTRL.
- Benchmarking Differential Privacy and Federated Learning for BERT Models DP-FL training of BERT, RoBERTa, DistillBERT and ALBERT.
- Locating and Editing Factual Associations in GPT ROME method allows to trace factual predictions back to single neurons and manipulate them.
- Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks Shows issues of ROME and improves upon it.
- DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models Detect and manipulate neurons connected with private information.
- Machine Unlearning SISA method divides training data and the model training so that unlearning involves repeating just a part of the training process.
- Knowledge Unlearning for Mitigating Privacy Risks in Language Models Negates the loss function used in training, with the objective to maximize the loss on the target sequences.
- Who's Harry Potter? Approximate Unlearning in LLMs Use a model fine-tuned on the data to forget in order to compare with the original model the likelihood growth and identify the sensitive data.
- Unlearn What You Want to Forget: Efficient Unlearning for LLMs Build unlearning layers and train them with a selective student-teacher objective based on KL-divergence in order for the student model to maximize the divergence from the teacher model (on target data).
- Preserving Privacy Through Dememorization: An Unlearning Technique For Mitigating Memorization Risks In Language Models DeMemorization through Reinforcement Learning.
- Knowledge Sanitization of Large Language Models Sanitization approach to limit hallucinations deriving from unlearning.
- In-Context Unlearning: Language Models as Few Shot Unlearners Machine Unlearning enforced through ICL.
- TensorFlow Privacy Python library with optimizers for training ML models with DP.
- PyVacy Pytorch translation of TensorFlow Privacy.
- OpenDP project Collection of algorithms for generating DP statistics.
- DiffPrivLib Provides a wide range of DP tools for ML and data analysis.
- Google DP Provides a broad set of DP tools.
- Microsoft DP Inference-DP framework.
- EKTELO Flexible and extensible framework for DP data analysis.
- PyTorch Opacus Enables training PyTorch models with DP.
- private-transformers Provides a privacy engine built off Opacus rewritten specifically to facilitate integration with the transformers library.
- dp-transformers Toolkit that provides a simplified integration of transformers training with DP.
- Chorus DP statistical queries through a cooperative query processing system.
- autodp Automates the process of calculating the privacy guarantees for complex algorithms and supports several standard DP mechanisms.