Our Survey: Against The Achilles’ Heel: A Survey on Red Teaming for Generative Models [Paper]
To gain a comprehensive understanding of potential attacks on GenAI and develop robust safeguards. We:
- Survey over 120 papers, cover the pipeline from risk taxonomy, attack strategies, evaluation metrics, and benchmarks to defensive approaches.
- propose a comprehensive taxonomy of LLM attack strategies grounded in the inherent capabilities of models developed during pretraining and fine-tuning.
- Implemented more than 30+ auto red teaming methods. (Which will not be open-sourced to avoid malicious use.)
- Please review the complete list of papers for the latest research.
To stay updated or try our RedTeaming tool, please subscribe to our newsletter at our website or join us on Discord!
To know more about these, please read our paper.
@article{lin2024achilles,
title={Against The Achilles' Heel: A Survey on Red Teaming for Generative Models},
author={Lizhi Lin and Honglin Mu and Zenan Zhai and Minghan Wang and Yuxia Wang and Renxi Wang and Junjie Gao and Yixuan Zhang and Wanxiang Che and Timothy Baldwin and Xudong Han and Haonan Li},
year={2024},
journal={arXiv preprint, arXiv:2404.00629},
primaryClass={cs.CL}
}