Unraveling Downstream Gender Bias from Large Language Models: A Study on AI Educational Writing Assistance
This repository is the official implementation of the EMNLP Findings 2023 publication entitled "Unraveling Downstream Gender Bias from Large Language Models: A Study on AI Educational Writing Assistance" written by Thiemo Wambsganss*, Xiaotian Su*, Vinitra Swamy, Seyed Parsa Neshaei, Roman Rietsche, and Tanja Käser.
Large Language Models (LLMs) are increasingly utilized in educational tasks such as providing writing suggestions to students. Despite their potential, LLMs are known to harbor inherent biases which may negatively impact learners. Previous studies have investigated bias in models and data representations separately, neglecting the potential impact of LLM bias on human writing.
In this paper, we investigate how bias transfers through an AI writing support pipeline. We conduct a large-scale user study with 231 students writing business case peer reviews in German. Students are divided into five groups with different levels of writing support: one in-classroom group with recommender system feature-based suggestions and four groups recruited from Prolific -- a control group with no assistance, two groups with suggestions from fine-tuned GPT-2 and GPT-3 models, and one group with suggestions from pre-trained GPT-3.5. Using GenBit gender bias analysis, Word Embedding Association Tests (WEAT), and Sentence Embedding Association Test (SEAT) we evaluate the gender bias at various stages of the pipeline: in reviews written by students, in suggestions generated by the models, and in model embeddings directly. Our results demonstrate that there is no significant difference in gender bias between the resulting peer reviews of groups with and without LLM suggestions. Our research is therefore optimistic about the use of AI writing support in the classroom, showcasing a context where bias in LLMs does not transfer to students' responses.
-
Classroom study (Recommender System Writing Support):
G0
: Real-world data from a Swiss-German university classroom setting, who received feedback from a traditional feature-based recommender system.
-
(Online) Prolific study (LLM Writing Support):
G1
: a control group who received no writing support.G2
: received suggestions from finetuned GPT-2.G3
: received suggestions from finetuned GPT-3.G4
: received suggestions from pre-trained GPT-3.5.
-
GenBit
: GenBit analysis algorithms and gender pairs in German. Scripts are used directly from the following repo. -
SEAT
: SEAT test files, consisting of:english
: original SEAT test files in English. sent-weat1.jsonl to sent-weat8b.jsonl files from FairPy.german
: traslated SEAT test files in German (Translated with DeepL and manually examined and corrected by two German native speakers).
-
WEAT
: WEAT test results, consisiting of:finetuned_GPT_results
: results for each of 9 WEAT tests from the finetuned GPT 2 embeddings.GLOVE_embeddings_analysis
: results of training a GloVE model for each group.weat_cooccurence_analysis
: results of conducting a WEAT cooccurence analysis on the raw files for each group. The code used for this analysis is detailed in the COLING 2022 paper, "Bias at a Second Glance: A Deep Dive into Bias for German Educational Peer-Review Data Modeling" and the corresponding repository.
-
dataset
: anonymized reviews of Group 0 (classroom study), reviews, suggestions and demographics data for Group 1 to Group 4 (online prolific study).G0_G1_G2_G3_G4_reviews.csv
: reviews from G0 to G4.G0_reviews_raw.csv
: both original reviews scrapped from HTML and processed reviews of G0.G1_G2_G3_G4_by_user_reviews_suggestions.csv
: reviews, all received suggestions and suggestions accepted by participants of G1-G4.G1_G2_G3_G4_demographics.csv
: anonymized demographics data of G1-G4.G2_G3_G4_Suggestions.csv
: GPT version and suggestions of G2-G4.- Business Model description in German
-
notebooks
: Code for data cleaning, GenBit bias analysis and visualization.
The fine-tuned model is available here on HuggingFace.
This code is provided for educational purposes and aims to facilitate reproduction of our results, and further research in this direction. We have done our best to document, refactor, and test the code before publication.
If you find any bugs or would like to contribute new models, analyses, etc, please let us know. Feel free to file issues and pull requests on the repo and we will address them as we can.
If you find this code useful in your work, please cite our paper:
Wambsganss T., Su X., Swamy V., Neshaei S. P., Rietsche R., Käser T. (2023).
Unraveling Downstream Gender Bias from Large Language Models: A Study on AI Educational Writing Assistance.
In: Findings of the Association for Computational Linguistics (EMNLP 2023).
This code is free software: you can redistribute it and/or modify it under the terms of the MIT License.
This software is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the MIT License for details.