This repo is the attacker part in the paper below.
⬆️ commit id: 67fdf3500ec3ccd363cfefc884997d7e40169b82
⬆️ commit id: a17c605d44d53b222b0127f77643519ae33aefd9
We developed several Tibetan adversarial attack methods based on OpenAttack (OpenAttack: An Open-source Textual Adversarial Attack Toolkit (Zeng et al., ACL 2021)).
⬆️ commit id: 4df712e0a5aebc03daa9b1ef353da4b7ea0a1b23
- You need to put the fine-tuned LMs into the dirs (data/Victim.XLMROBERTA.CINO-SMALL-V2_TNCC-TITLE, data/Victim.XLMROBERTA.CINO-SMALL-V2_TUSA, data/Victim.XLMROBERTA.CINO-BASE-V2_TNCC-DOCUMENT, data/Victim.XLMROBERTA.CINO-BASE-V2_TNCC-TITLE, data/Victim.XLMROBERTA.CINO-BASE-V2_TUSA, data/Victim.XLMROBERTA.CINO-LARGE-V2_TNCC-DOCUMENT, data/Victim.XLMROBERTA.CINO-LARGE-V2_TNCC-TITLE, data/Victim.XLMROBERTA.CINO-LARGE-V2_TUSA, data/Victim.XLMROBERTA.TIBETAN-BERT_TNCC-TITLE, data/Victim.XLMROBERTA.TIBETAN-BERT_TUSA, etc.).
- You need to download and unzip the Tibetan word vectors (Learning Word Vectors for 157 Languages (Grave et al., LREC 2018)) into the dir (data/AttackAssist.TibetanWord2Vec).
- You need to put the pre-trained LM: Tibetan-BERT (Research and Application of Tibetan Pre-training Language Model Based on BERT (Zhang et al., ICCIR 2022)), TiBERT (TiBERT: Tibetan Pre-trained Language Model (Liu et al., SMC 2022)), etc. into the dirs (data/AttackAssist.Tibetan_BERT, data/AttackAssist.TiBERT, etc.).
- You need to put the trained model: segbase.cpkt (link: https://pan.baidu.com/s/1j_60cDWVlfryikaP-1Nvbw password: 19pe) of TibetSegEYE (https://github.com/yjspho/TibetSegEYE) into the dir (data/AttackAssist.TibetSegEYE).
- You need to follow the OpenAttack README (OpenAttack: An Open-source Textual Adversarial Attack Toolkit (Zeng et al., ACL 2021)) to install the development environment.
- You can run the attack scripts in the dir (demo_tibetan).
If you think our work useful, please kindly cite our paper.
@inproceedings{10.1145/3589335.3652503,
author = {Cao, Xi and Qun, Nuo and Gesang, Quzong and Zhu, Yulei and Nyima, Trashi},
title = {Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model},
year = {2024},
isbn = {9798400701726},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3589335.3652503},
doi = {10.1145/3589335.3652503},
booktitle = {Companion Proceedings of the ACM on Web Conference 2024},
pages = {1672–1680},
numpages = {9},
keywords = {language model, robustness, textual adversarial attack, tibetan},
location = {Singapore, Singapore},
series = {WWW '24}
}
@inproceedings{cao-etal-2023-pay-attention,
title = "Pay Attention to the Robustness of {C}hinese Minority Language Models! Syllable-level Textual Adversarial Attack on {T}ibetan Script",
author = "Cao, Xi and
Dawa, Dolma and
Qun, Nuo and
Nyima, Trashi",
booktitle = "Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.trustnlp-1.4",
pages = "35--46"
}