Skip to content

A codebase for ACL 2023 paper: Mitigating Label Biases for In-context Learning

Notifications You must be signed in to change notification settings

fywalter/label-bias

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mitigating Label Biases for In-context Learning

This is a codebase of our ACL 2023 paper. It is developed based on the codebase by Tony Z. Zhao to perform few-shot in-context learning on text classification tasks using large language models such as GPT-3.

You can run this codebase with GPT-3, GPT-J, GPT-2, and potentially any other language model available in HuggingFace Transformers. To run the codebase with GPT-3, you should place your API key into a file named openai_key.txt.

Running this codebase will report results with domain-context calibration (our proposed method), contextual calibration, and without any calibration method.

Dependencies

This code is written using PyTorch and HuggingFace's Transformer repo. Running the code locally requires GPUs (except for OpenAI models like GPT-3), but with some minor modification it is possible to adapt the code to devices with only CPUs.

Installation

The easiest way to install the code is to create a fresh anaconda environment:

conda create -n labelbias python=3.9
conda activate labelbias
pip install -r requirements.txt

Data

All the datasets we used are publicly available text classification datasets. Running the code will automatically download datasets from Huggingface except for ones already exists in the data folder.

Reproducing Our Results

Here is how to replicate the results from our paper for GPT-3. To replicate the results on SST-2 (8-shots, 5 different in-context examples, subsample 500 samples from the original test set):

CUDA_VISIBLE_DEVICES=0 python run_classification.py \
--model="davinci" \
--dataset="sst2" \
--num_seeds=5 \
--all_shots="8" \
--api_num_log_prob 100 \
--subsample_test_set 500 \
--recompute_probs

To test on the whole test set, remove --subsample_test_set 500.

Overview of Codebase

Data

The data folder contains the raw data for a part of supported tasks (other tasks are handled by the Huggingface dataset library). If you'd like to add your own task, you can either add the data into that folder or use datasets from Huggingface. The code for loading a dataset (from the data folder or from Huggingface), as well as defining the prompt format for a task, is in data_utils.py. You can refer to the loader functions for existing tasks for writing one for a new task.

Run Scripts

The run scripts, e.g., run_classification.py, contain the code for randomly sampling the examples to use in the prompt, calling the models, the necessary evaluation metrics, and more. Inside the run script, you can set the parameters for the experiments using the command line arguments.

For all experiments, we save and pickle the outputs of the model (including the raw predictions of the model, the evaluation metrics, e.g., accuracy, estimated priors, etc). This makes doing a post-hoc analysis very fast.

References

Please consider citing our work if you found this code or our paper beneficial to your research.

@article{Fei2023MitigatingLB,
  title={Mitigating Label Biases for In-context Learning},
  author={Yu Fei and Yifan Hou and Zeming Chen and Antoine Bosselut},
  journal={ArXiv},
  year={2023},
  volume={abs/2305.19148},
  url={https://api.semanticscholar.org/CorpusID:258967265}
}  	

About

A codebase for ACL 2023 paper: Mitigating Label Biases for In-context Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages