Skip to content

metaphors/TibetanAdversarialAttack

 
 

Repository files navigation

Tibetan Adversarial Attack

Introduction

This repo is the attacker part in the paper below.

Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model (Cao et al., WWW 2024 Workshop - SocialNLP)

⬆️ commit id: 67fdf3500ec3ccd363cfefc884997d7e40169b82

Pay Attention to the Robustness of Chinese Minority Language Models! Syllable-level Textual Adversarial Attack on Tibetan Script (Cao et al., ACL 2023 Workshop - TrustNLP)

⬆️ commit id: a17c605d44d53b222b0127f77643519ae33aefd9

We developed several Tibetan adversarial attack methods based on OpenAttack (OpenAttack: An Open-source Textual Adversarial Attack Toolkit (Zeng et al., ACL 2021)).

⬆️ commit id: 4df712e0a5aebc03daa9b1ef353da4b7ea0a1b23

Usage Method

  1. You need to put the fine-tuned LMs into the dirs (data/Victim.XLMROBERTA.CINO-SMALL-V2_TNCC-TITLE, data/Victim.XLMROBERTA.CINO-SMALL-V2_TUSA, data/Victim.XLMROBERTA.CINO-BASE-V2_TNCC-DOCUMENT, data/Victim.XLMROBERTA.CINO-BASE-V2_TNCC-TITLE, data/Victim.XLMROBERTA.CINO-BASE-V2_TUSA, data/Victim.XLMROBERTA.CINO-LARGE-V2_TNCC-DOCUMENT, data/Victim.XLMROBERTA.CINO-LARGE-V2_TNCC-TITLE, data/Victim.XLMROBERTA.CINO-LARGE-V2_TUSA, data/Victim.XLMROBERTA.TIBETAN-BERT_TNCC-TITLE, data/Victim.XLMROBERTA.TIBETAN-BERT_TUSA, etc.).
  2. You need to download and unzip the Tibetan word vectors (Learning Word Vectors for 157 Languages (Grave et al., LREC 2018)) into the dir (data/AttackAssist.TibetanWord2Vec).
  3. You need to put the pre-trained LM: Tibetan-BERT (Research and Application of Tibetan Pre-training Language Model Based on BERT (Zhang et al., ICCIR 2022)), TiBERT (TiBERT: Tibetan Pre-trained Language Model (Liu et al., SMC 2022)), etc. into the dirs (data/AttackAssist.Tibetan_BERT, data/AttackAssist.TiBERT, etc.).
  4. You need to put the trained model: segbase.cpkt (link: https://pan.baidu.com/s/1j_60cDWVlfryikaP-1Nvbw password: 19pe) of TibetSegEYE (https://github.com/yjspho/TibetSegEYE) into the dir (data/AttackAssist.TibetSegEYE).
  5. You need to follow the OpenAttack README (OpenAttack: An Open-source Textual Adversarial Attack Toolkit (Zeng et al., ACL 2021)) to install the development environment.
  6. You can run the attack scripts in the dir (demo_tibetan).

Citation

If you think our work useful, please kindly cite our paper.

@inproceedings{10.1145/3589335.3652503,
    author = {Cao, Xi and Qun, Nuo and Gesang, Quzong and Zhu, Yulei and Nyima, Trashi},
    title = {Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model},
    year = {2024},
    isbn = {9798400701726},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3589335.3652503},
    doi = {10.1145/3589335.3652503},
    booktitle = {Companion Proceedings of the ACM on Web Conference 2024},
    pages = {1672–1680},
    numpages = {9},
    keywords = {language model, robustness, textual adversarial attack, tibetan},
    location = {Singapore, Singapore},
    series = {WWW '24}
}
@inproceedings{cao-etal-2023-pay-attention,
    title = "Pay Attention to the Robustness of {C}hinese Minority Language Models! Syllable-level Textual Adversarial Attack on {T}ibetan Script",
    author = "Cao, Xi  and
      Dawa, Dolma  and
      Qun, Nuo  and
      Nyima, Trashi",
    booktitle = "Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.trustnlp-1.4",
    pages = "35--46"
}

About

This repo is the attacker part in the papers.

Resources

License

Stars

Watchers

Forks

Languages

  • Python 100.0%