Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models

Code for the paper Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models

Prerequisites

Authorship

This work extends experiments done by Taylor Webb in https://github.com/taylorwwebb/emergent_analogies_LLM/. All modifications carried out by Martha Lewis

Instructions

To generate problems with e.g. 5 letters permuted, run

python gen_problems_by_alph.py --num_permuted 5 5 can be replaced with other numbers.

To evaluate GPT models on the counterfactual comprehension check (CCC), run the script eval_GPT_CCC.py, specifying which GPT model to run, how many letters in the alphabet should be permuted, and which CCC to run (successor or predecessor). You should already have created a set of problems with that number of letters permuted. For example, to evaluate GPT 3.5 on the CCC for an alphabet with 5 letters permuted, run:

python eval_GPT_CCC.py --gpt 35 --num_permuted 5 --problem succ

To get performance of models on the CCCs, run get_CCC_accuracies.py, specifying which CCC to analyse. For example:

python get_CCC_accuracies.py --problem succ

To evaluate GPT models on the counterfactual analogy problems generated, run eval_GPT_letterstring.py, specifying which GPT model to run, which prompt style to use, and how many letters are permuted. For example, to evaluate GPT 3.5 with a human-like prompt on problems with 10 letters permuted, run:

python eval_GPT_letterstring.py --gpt 35 --num_permuted 10 --promptstyle human

Performance is calculated and plotted in plotting.ipynb.

Predefined problems are available in the directory 'problems', pre run model predictions are available in the directories 'GPTn_predictions' and human data is available in 'data'.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
CCC_results		CCC_results
GPT35_predictions		GPT35_predictions
GPT3_predictions		GPT3_predictions
GPT4_predictions		GPT4_predictions
data		data
plots		plots
problems		problems
webb_original		webb_original
README.md		README.md
eval_GPT_CCC.py		eval_GPT_CCC.py
eval_GPT_letterstring.py		eval_GPT_letterstring.py
gen_problems_by_alph.py		gen_problems_by_alph.py
get_CCC_accuracies.py		get_CCC_accuracies.py
plotting.ipynb		plotting.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models

Prerequisites

Authorship

Instructions

About

Releases

Packages

Languages

marthaflinderslewis/counterfactual_analogy

Folders and files

Latest commit

History

Repository files navigation

Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models

Prerequisites

Authorship

Instructions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages