Our Survey: Against The Achilles’ Heel: A Survey on Red Teaming for Generative Models [Paper]

To gain a comprehensive understanding of potential attacks on GenAI and develop robust safeguards. We:

Survey over 120 papers, cover the pipeline from risk taxonomy, attack strategies, evaluation metrics, and benchmarks to defensive approaches.
propose a comprehensive taxonomy of LLM attack strategies grounded in the inherent capabilities of models developed during pretraining and fine-tuning.
Implemented more than 30+ auto red teaming methods. (Which will not be open-sourced to avoid malicious use.)
Please review the complete list of papers for the latest research.

To stay updated or try our RedTeaming tool, please subscribe to our newsletter at our website or join us on Discord!

Red teaming Papers by type from 2023 onwards.

Overview

Language Model Attack Strategy, Prompt Searcher, and Defense

Benchmarks, Multimodal Red Teaming, Agent Red Teaming

To know more about these, please read our paper.

Citation

@article{lin2024achilles,
      title={Against The Achilles' Heel: A Survey on Red Teaming for Generative Models}, 
      author={Lizhi Lin and Honglin Mu and Zenan Zhai and Minghan Wang and Yuxia Wang and Renxi Wang and Junjie Gao and Yixuan Zhang and Wanxiang Che and Timothy Baldwin and Xudong Han and Haonan Li},
      year={2024},
      journal={arXiv preprint, arXiv:2404.00629},
      primaryClass={cs.CL}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

open_red_teaming_survey.md

open_red_teaming_survey.md

Our Survey: Against The Achilles’ Heel: A Survey on Red Teaming for Generative Models [Paper]

Red teaming Papers by type from 2023 onwards.

Overview

Language Model Attack Strategy, Prompt Searcher, and Defense

Benchmarks, Multimodal Red Teaming, Agent Red Teaming

Citation

Files

open_red_teaming_survey.md

Latest commit

History

open_red_teaming_survey.md

File metadata and controls

Our Survey: Against The Achilles’ Heel: A Survey on Red Teaming for Generative Models [Paper]

Red teaming Papers by type from 2023 onwards.

Overview

Language Model Attack Strategy, Prompt Searcher, and Defense

Benchmarks, Multimodal Red Teaming, Agent Red Teaming

Citation