ToXCL: A Unified Framework for Toxic Speech Detection and Explanation

Nhat M. Hoang^1* Xuan Long Do^2,3* Duc Anh Do¹ Duc Anh Vu¹ Luu Anh Tuan¹

¹Nanyang Technological University
²National University of Singapore
³Institute for Infocomm Research (I2R), A*STAR

^*Equal Contribution

Setup Environment

This code was tested on Python 3.8 and CUDA 11.6

conda create -n toxcl python=3.8
conda activate toxcl
pip install -r requirements.txt

Datasets

We upload the pre-processed datasets used in the paper:

Implicit Hate Corpus (IHC): IHC_train.csv, IHC_valid.csv
Social Bias Inference Corpus (SBIC): SBIC_train.csv, SBIC_valid.csv, SBIC_test.csv
HateXplain (used to train the Target Generator TG): TG_train.csv, TG_valid.csv

For IHC and SBIC:

Column for input:
- Use raw_text for the original input, format: "{raw_text}"
- Use text for input with target groups, format: "Target: {TG} Post: {raw_text}"
Comlumn for output:
- Use explanations for the baseline group G2, format: "{explanations}
- Use output for E2E generation, format: "{class} <SEP> {explanations}"

Baselines

Train Encoder-only model (HateBert, BERT, ELECTRA, RoBERTa)

# `model_checkpoint` used in paper: GroNLP/hateBERT, bert-base-uncased, google/electra-base-discriminator, roberta-base
python -m baselines.train_encoder_arch \
    --model_name {model_checkpoint} \
    --output_dir {output_dir} \
    --dataset_name {IHC | SBIC}

Train Decoder-only model (GPT-2)

python -m baselines.train_decoder_arch \
    --model_name_or_path gpt2 \
    --output_dir {output_dir} \
    --dataset_name {IHC | SBIC} \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --max_train_steps 20000 \
    --learning_rate 1e-4 \
    --text_column {raw_text | text} \
    --summary_column {explanations | output}

Train Encoder-Decoder model (BART, T5, Flan-T5)

# `model_checkpoint` used in paper: facebook/bart-base, t5-base, google/flan-t5-base
python baselines/train_encoder_decoder_arch \
    --model_name_or_path {model_checkpoint} \
    --output_dir {output_dir} \
    --dataset_name {IHC | SBIC} \
    --text_column {raw_text | text} \
    --summary_column {explanations | output} \
    --do_train --do_eval \
    --source_prefix "summarize: " \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 16 \
    --gradient_accumulation_steps 1 \
    --predict_with_generate True \
    --max_source_length 256 \
    --learning_rate 0.00001 \
    --num_beams 4 \
    --max_steps 20000 \
    --save_steps 500 --eval_steps 500 \
    --evaluation_strategy steps \
    --load_best_model --report_to none

Zero-shot inference with LLM (ChatGPT, Mistral-7b)

python baselines/test_llm.py mistral --test_data data/IHC_valid.csv --output_dir saved/llm

Argument notes:

text_column:
- Use raw_text for the original input, format: "{raw_text}"
- Use text for input with target groups, format: "Target: {TG} Post: {raw_text}"
summary_column:
- Use explanations for group G2, format: "{explanations}
- Use output for E2E generation, format: "{class} <SEP> {explanations}"

ToXCL

Note: The train.py script does not incorporate the Target Group Generator (TG) during training. Instead, we pre-generate the target groups separately and store them in the dataset to accelerate the training process. The augmented dataset can be found in the data folder. For a complete inference pipeline, please refer to inference.ipynb.

# (1) Train Target Group Generator
python baselines/train_encoder_decoder_arch.py \
    --model_name_or_path t5-base \
    --output_dir saved/T5-TG \
    --dataset_name TG \
    --text_column raw_text \
    --summary_column target_groups \
    --do_train --do_eval \
    --source_prefix "summarize: " \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 16 \
    --gradient_accumulation_steps 1 \
    --predict_with_generate True \
    --max_source_length 256 \
    --learning_rate 0.00001 \
    --num_beams 4 \
    --max_steps 20000 \
    --save_steps 500 --eval_steps 500 \
    --evaluation_strategy steps \
    --load_best_model --report_to none

# (2) Train teacher model
python -m baselines.train_encoder_arch \
    --model_name roberta-large \
    --output_dir saved/RoBERTa-L_IHC \
    --dataset_name IHC \
    --text_column_num 1     # 1 is with Target Groups, 0 otherwise

# (3) Train ToXCL
# Remove the argument `--teacher_name_or_path` to train the model without teacher forcing mode
python -m train \
    --model_name_or_path google/flan-t5-base \
    --teacher_name_or_path saved/RoBERTa-L_IHC \
    --output_dir saved/ToXCL \
    --dataset_name IHC

CUDA_VISIBLE_DEVICES=2,3 accelerate launch -m train_acc \
    --model_name_or_path google/flan-t5-base \
    --teacher_name_or_path saved/RoBERTa-L_IHC \
    --output_dir saved/ToXCL_acc \
    --dataset_name IHC

Development

This is a research implementation and, in general, will not be regularly updated or maintained long after release.

Citation

If you find our work useful for your research and development, please consider citing the paper:

@misc{hoang2024toxcl,
      title={ToXCL: A Unified Framework for Toxic Speech Detection and Explanation}, 
      author={Nhat M. Hoang and Xuan Long Do and Duc Anh Do and Duc Anh Vu and Luu Anh Tuan},
      year={2024},
      eprint={2403.16685},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
baselines		baselines
data		data
figures		figures
saved/llm		saved/llm
.gitignore		.gitignore
README.md		README.md
eval_metrics.py		eval_metrics.py
inference.ipynb		inference.ipynb
requirements.txt		requirements.txt
toxcl.py		toxcl.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ToXCL: A Unified Framework for Toxic Speech Detection and Explanation

Setup Environment

Datasets

Baselines

ToXCL

Development

Citation

About

Releases

Packages

Languages

NhatHoang2002/ToXCL

Folders and files

Latest commit

History

Repository files navigation

ToXCL: A Unified Framework for Toxic Speech Detection and Explanation

Setup Environment

Datasets

Baselines

ToXCL

Development

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages