diff --git a/README.md b/README.md index be97c07..f4a473d 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,8 @@ :point_right: Dataset coming soon! +👉 [Read the paper](https://arxiv.org/abs/2406.11801) + ## Table of Contents - [Installation](#installation) @@ -49,3 +51,14 @@ Run Safety_Arithmetic_Edited.ipynb file by passing direct edited (without HDR). ## Citation If you find this useful in your research, please consider citing: + +``` +@misc{hazra2024safety, + title={Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations}, + author={Rima Hazra and Sayan Layek and Somnath Banerjee and Soujanya Poria}, + year={2024}, + eprint={2406.11801}, + archivePrefix={arXiv}, + primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'} +} +```