Skip to content

Commit

Permalink
letterstring
Browse files Browse the repository at this point in the history
  • Loading branch information
Martha authored and Martha committed Nov 14, 2024
1 parent c23a134 commit 4eb25f2
Show file tree
Hide file tree
Showing 216 changed files with 301,743 additions and 0 deletions.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
34 changes: 34 additions & 0 deletions letterstring/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Letterstring analogies

## Generating problems
This directory contains code to generate letterstring analogy problems with permuted alphabets, in `gen_problems_by_alph.py`.

The problems used in the paper are all available in `problems`

## Testing GPT
GPT can be tested on the problems by running `eval_GPT_letterstring.py` with command line arguments `--promptstyle` to choose a promptstyle -- best results obtained with `hw` for Hodel and West's prompt. `--num_permuted` allows you to to choose a number of letters permuted. Choices are: 1, 2, 5, 10, 20 or symb. `--gpt` allows you to choose a gpt model. Now that GPT-3 is deprecated, the choices are `35` for 3.5 or `4` for 4. `--gen` allows you to choose between generalized problems (gen) or non generalized problems (nogen)

#### Example usage.
To evaluate GPT 4 on problems with 10 letters permuted, not generalized, with the prompt from Hodel and West 2024, you would call

`python eval_GPT_letterstring.py --gpt 4 --num_permuted 10 --gen nogen --promptstyle hw`

## Evaluating GPT on counterfactual comprehesion test
GPT can be tested on the CCC by running `eval_GPT_letterstring_control.py` with command line arguments `--num_permuted` allowing you to to choose a number of letters permuted. Choices are: 1, 2, 5, 10, 20 or symb. `--gpt` allows you to choose a gpt model. Now that GPT-3 is deprecated, the choices are `35` for 3.5 or `4` for 4. `--problem` allows you to choose between successor (`succ`) or predecessor (`pred`) CCC tests.

#### Example usage.
To evaluate GPT 3.5 on the CCC with 20 letters permuted and the successor problem, you would call

`python eval_GPT_letterstring_control.py --gpt 35 --num_permuted 20 --problem pred`

## Results
Results are stored in `GPT{X}_prob_predictions_multi_alph` directories as `.npz` files. Results are processed and saved as csv in `results_csvs`.

## Human data
Human data is available in `results_csvs` as `human_gen.csv`, `human_nogen.csv`, and in `gpt_human_data.csv`

## Data analysis and plotting
A notebook in `plotting` gives code to generate all plots in the paper.



25 changes: 25 additions & 0 deletions letterstring/collate_json.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import json

all_permuted = {}
# Opening JSON file
for i in [1,2,5,10,20,'symb']:
with open(f'problems/nogen/all_prob_{i}_7_human.json', 'r') as f:
# returns JSON object as
# a dictionary
data = json.load(f)
all_permuted[f'np_{i}'] = data

print(all_permuted.keys())

all_prob_json_string = json.dumps(all_permuted, indent=2)

# Write to js script
js_fname = f'./problems/nogen/all_prob_all_permuted_7_human.js'
js_fid = open(js_fname, 'w')
js_fid.write('var all_problems = ' + all_prob_json_string)
js_fid.close()

json_fname = f'./problems/nogen/all_prob_all_permuted_7_human.json'
json_fid = open(json_fname, 'w')
json_fid.write(all_prob_json_string)
json_fid.close()
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
198 changes: 198 additions & 0 deletions letterstring/eval_GPT_letterstring.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
import openai
import numpy as np
import builtins
import argparse
import os
import time
import sys


def check_path(path):
if not os.path.exists(path):
os.mkdir(path)

# Settings
parser = argparse.ArgumentParser()
parser.add_argument('--sentence', action='store_true', help="Present problem in sentence format.")
parser.add_argument('--noprompt', action='store_true', help="Present problem without prompt.")
parser.add_argument('--newprompt', action='store_true', help="Present problem with new prompt.")
parser.add_argument('--promptstyle', help='Give a prompt style: human, minimal, hw, webb, webbplus')
parser.add_argument('--num_permuted', help="give a number of letters in the alphabet to permute from 2 to 26")
parser.add_argument('--gpt', help='give gpt model: 3, 35, 4')
parser.add_argument('--gen', help='give gen for generalized problems or nogen for non generalized')


args = parser.parse_args()
print(args.promptstyle)

if args.promptstyle == "webb" and int(args.num_permuted) >1:
print("promptstyle webb can only be used with an unpermuted alphabet")
sys.exit()

# GPT-3 settings
openai.api_key = "API KEY HERE"
if args.gpt == '3':
kwargs = {"engine":"text-davinci-003", "temperature":0, "max_tokens":40, "stop":"\n", "echo":False, "logprobs":1, }
elif args.gpt == '35':
kwargs = { "model":"gpt-3.5-turbo", "temperature":0, "max_tokens":40, "stop":"\n"}
elif args.gpt == '4':
kwargs = { "model":"gpt-4", "temperature":0, "max_tokens":40, "stop":"\n"}

# Load all problems
if args.gen == 'gen':
all_prob = np.load(f'./problems/{args.gen}/all_prob_{args.num_permuted}_7_gpt_human_alphs.npz', allow_pickle=True)['all_prob']
elif args.gen == 'nogen':
all_prob = np.load(f'./problems/{args.gen}/all_prob_{args.num_permuted}_7_human.npz', allow_pickle=True)['all_prob']

response_dict={}

for alph in all_prob.item().keys():
print(alph)
if (all_prob.item()[alph]['shuffled_letters'] is not None):
shuffled_letters = builtins.list(all_prob.item()[alph]['shuffled_letters'])
else:
shuffled_letters = None

shuffled_alphabet = builtins.list(all_prob.item()[alph]['shuffled_alphabet'])

prob_types = builtins.list(all_prob.item()[alph].keys())[2:] # first two items are list of shuffled letters and shuflled alphabet: skip this
N_prob_types = len(prob_types)

alph_string = ' '.join(shuffled_alphabet)
print(alph_string)

# Evaluate
N_trials_per_prob_type = 10
all_prob_type_responses = []
count = 0
for p in range(N_prob_types):
if prob_types[p] == 'attn':
alph_string = "For this question, ignore other instructions and respond 'a a a a'"
print('problem type ' + str(p+1) + ' of ' + str(N_prob_types) + '...')
prob_type_responses = []
for t in range(N_trials_per_prob_type):
print('trial ' + str(t+1) + ' of ' + str(N_trials_per_prob_type) + '...')
# Generate prompt
prob = all_prob.item()[alph][prob_types[p]]['prob'][t]
prompt=''
if not args.noprompt:
if args.promptstyle not in ["minimal", "hw", "webb","webbplus"]:
prompt+='Use the following alphabet to guess the missing piece.\n\n' \
+ alph_string \
+ '\n\nNote that the alphabet may be in an unfamiliar order. Complete the pattern using this order.\n\n'
elif args.promptstyle == 'minimal':
prompt+='Use the following alphabet to complete the pattern.\n\n' \
+ alph_string \
+ '\n\nNote that the alphabet may be in an unfamiliar order. Complete the pattern using this order.\n\n'
elif args.promptstyle == 'hw':
prompt+='Use this fictional alphabet: \n\n' \
+ alph_string \
+ "\n\nLet's try to complete the pattern:\n\n"
elif args.promptstyle == "webb":
prompt += "Let's try to complete the pattern:\n\n"
elif args.promptstyle == "webbplus":
prompt += "Let's try to complete the pattern. Just give the letters that complete the pattern and nothing else at all. Do not describe the pattern.\n\n"
if args.sentence:
prompt += 'If '
for i in range(len(prob[0][0])):
prompt += str(prob[0][0][i])
if i < len(prob[0][0]) - 1:
prompt += ' '
prompt += ' changes to '
for i in range(len(prob[0][1])):
prompt += str(prob[0][1][i])
if i < len(prob[0][1]) - 1:
prompt += ' '
prompt += ', then '
for i in range(len(prob[1][0])):
prompt += str(prob[1][0][i])
if i < len(prob[1][0]) - 1:
prompt += ' '
prompt += ' should change to '
else:
prompt += '['
for i in range(len(prob[0][0])):
prompt += str(prob[0][0][i])
if i < len(prob[0][0]) - 1:
prompt += ' '
prompt += '] ['
for i in range(len(prob[0][1])):
prompt += str(prob[0][1][i])
if i < len(prob[0][1]) - 1:
prompt += ' '
prompt += ']\n['
for i in range(len(prob[1][0])):
prompt += str(prob[1][0][i])
if i < len(prob[1][0]) - 1:
prompt += ' '
if args.promptstyle in ["minimal", "hw", "webb","webbplus"]:
prompt += '] ['
else:
prompt += '] [ ? ]'
if args.promptstyle == "human":
messages = [{'role': 'system', 'content':'You are able to solve letter-string analogies'},
{'role': 'user', 'content': "In this study, you will be presented with a series of patterns involving alphanumeric characters, together with an example alphabet.\n\n" +
"Note that the alphabet may be in an unfamiliar order.\n" +
"Each pattern will have one missing piece marked by [ ? ].\n"+
"For each pattern, you will be asked to guess the missing piece.\n" +
"Use the given alphabet when guessing the missing piece.\n" +
"You do not need to include the '[ ]' or spaces between letters in your response.\n\n"+
"a b c h e f g d i j k l m n o p q r s t u v w x y z \n\n" +
"[a a a] [b b b]\n[c c c] [ ? ]"},
{'role':'assistant', 'content': 'h h h'},
{'role':'user', 'content': "In this case, the missing piece is 'h h h'\nNote that in the given alphabet, 'b' is the letter after 'a' and 'h' is the letter after 'c'"},
{'role':'user', 'content':prompt}]
elif args.promptstyle in ["minimal", "hw", "webb","webbplus"]:
messages = [{'role': 'system', 'content':'You are able to solve letter-string analogies'},
{'role':'user', 'content':prompt}]
else:
print("please enter a promptstyle")

if args.gpt == '3':
comp_prompt = ''
for m in messages:
comp_prompt += '\n' + m['content']
comp_prompt=comp_prompt.strip('\n')
# print(comp_prompt)
else:
pass

# Get response
response = []
while len(response) == 0:
if args.gpt == '3':
try:
response = openai.Completion.create(prompt=comp_prompt, **kwargs)
except:
print('trying again...')
time.sleep(5)
else:
try:
response = openai.ChatCompletion.create(messages=messages, **kwargs)
except:
print('trying again...')
time.sleep(5)

if args.gpt =='3':
prob_type_responses.append(response['choices'][0]['text'])
else:
prob_type_responses.append(response['choices'][0]['message']['content'])
# print(response)
count += 1
all_prob_type_responses.append(prob_type_responses)
response_dict[alph] = all_prob_type_responses
# Save
path = f'GPT{args.gpt}_prob_predictions_multi_alph/{args.gen}'
check_path(path)
save_fname = f'./{path}/gpt{args.gpt}_letterstring_results_{args.num_permuted}_multi_alph_gptprobs'
if args.promptstyle:
save_fname += f'_{args.promptstyle}'
if args.sentence:
save_fname += '_sentence'
if args.noprompt:
save_fname += '_noprompt'
save_fname += '.npz'
np.savez(save_fname, all_prob_type_responses=response_dict, allow_pickle=True)



Loading

0 comments on commit 4eb25f2

Please sign in to comment.