intersectional-ai-safety

This repository contains the R code used for Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety. The underlying data are available as part of Google Research's DICES Dataset.

Instructions

Download, pre-process, and save the DICES Data by running preprocess_and_cache_data.R.
With the pre-processed data, run the remaining R scripts beginning with ad (for models fit with all data) and qs (for models fit with data with qualitative risk severity ratings only).

Requirements / Performance Tips

Depending on your device, each model may take several hours to fit.
The R scripts are set to run on a device with at least 8 CPU threads. We recommend you have at least 9 available.
You can adjust number of threads used by editing the cores and threads arguments when calling brm().
Calculating model performance metrics is quite memory intensive. We recommend using a machine with at least 64 GB of RAM. Otherwise, comment out the performance code.
You can run the scripts as background jobs either locally or remotely on an HPC cluster.

Papers

Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety (2023). Christopher M. Homan^$\ast$, Gregory Serapio-García^$\ast$, Lora Aroyo, Mark Díaz, Alicia Parrish, Vinodkumar Prabhakaran, Alex S. Taylor, and Ding Wang.

DICES Dataset: Diversity in Conversational AI Evaluation for Safety (2023). Lora Aroyo^$\ast$, Alex S. Taylor^$\ast$, Mark Díaz, Christopher M. Homan, Alicia Parrish, Gregory Serapio-García, Vinodkumar Prabhakaran, and Ding Wang.

^$\ast$Contributed equally. Subsequent coauthors listed alphabetically.

Citation

If you use this code, please cite our paper:

@misc{homan2023intersectionality,
      title={Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety}, 
      author={Christopher M. Homan and Gregory Serapio-García and Lora Aroyo and Mark Díaz and Alicia Parrish and Vinodkumar Prabhakaran and Alex S. Taylor and Ding Wang},
      year={2023},
      eprint={2306.11530},
      archivePrefix={arXiv},
      primaryClass={cs.HC}
}

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
README.md		README.md
ad_0_null.R		ad_0_null.R
ad_1_effects.R		ad_1_effects.R
ad_2_intersectional.R		ad_2_intersectional.R
preprocess_and_cache_data.R		preprocess_and_cache_data.R
qs_0_null.R		qs_0_null.R
qs_1_effects.R		qs_1_effects.R
qs_2_effects_ge.R		qs_2_effects_ge.R
qs_3_intersectional.R		qs_3_intersectional.R
qs_4_intersectional_ge.R		qs_4_intersectional_ge.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

intersectional-ai-safety

Instructions

Requirements / Performance Tips

Papers

Citation

License

About

Releases

Packages

Contributors 2

Languages

gserapio/intersectional-ai-safety

Folders and files

Latest commit

History

Repository files navigation

intersectional-ai-safety

Instructions

Requirements / Performance Tips

Papers

Citation

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages