The Pythia Trials

Intro

Public repository to perform some research on Pythia and OLMo and how they behave when varying sizes/checkpoints in training. The hope is to reverse engineer some algorithms!

2-digit Addition

I have started by looking at 2-digit addition because it is a super simple task that is easy to understand, reason about, and to implement. With 2-digit addition, I am referring to the task

question = “19+21” prompt = f"Question: What is {question}? Answer: {question}="

Where the response has to be the correct summation. I only iterate through questions where the result is still two digits, ie. I stop iterating before 50+50=100.

The following table shows the performance of OLMo and Pythia along various sizes

Model	Size	2-sum
EleutherAI/pythia-14m	14m	0.00%
EleutherAI/pythia-70m-deduped	70m	0.00%
EleutherAI/pythia-160m-deduped	70m	0.16%
EleutherAI/pythia-410m-deduped	410m	4.56%
EleutherAI/pythia-1B-deduped	1B	5.08%
EleutherAI/pythia-1.4B-deduped	1.4B	20.69%
EleutherAI/pythia-2.8B-deduped	2.8B	86.32%
EleutherAI/pythia-6.9B-deduped	6.9B	91.88%
EleutherAI/pythia-12B-deduped	12B	88.96%

allenai/OLMo-1B-0724-hf	1B	42.86%
hamishivi/OLMo-1B-0724-SFT-hf	1B	53.50%
hamishivi/OLMo-1B-0724-Instruct-hf	1B	52.02%

allenai/OLMo-7B-0724-hf	7B	99.88%
allenai/OLMo-7B-0724-SFT-hf	7B
allenai/OLMo-7B-0724-Instruct-hf	7B

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.vscode		.vscode
datasets		datasets
figures		figures
lm-evaluation-harness		lm-evaluation-harness
src		src
.gitignore		.gitignore
2digit_benchmarking.py		2digit_benchmarking.py
README.md		README.md
attn_intdecoding.py		attn_intdecoding.py
context_lookups.py		context_lookups.py
dataset_creation.py		dataset_creation.py
hf_testing_ground.ipynb		hf_testing_ground.ipynb
interactive.ipynb		interactive.ipynb
llama3_arc_e_improved.py		llama3_arc_e_improved.py
logit_lens_2digit_intdecoding.py		logit_lens_2digit_intdecoding.py
olmo_inference.py		olmo_inference.py
pythia_arc_e.py		pythia_arc_e.py
pythia_arc_e_improved.py		pythia_arc_e_improved.py
pythia_commands.txt		pythia_commands.txt
pythia_inference.py		pythia_inference.py
requirements.txt		requirements.txt
transformer_lens_playground.py		transformer_lens_playground.py
tuned_lens_2digit_intdecoding.py		tuned_lens_2digit_intdecoding.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Pythia Trials

Intro

2-digit Addition

About

Releases

Packages

Languages

Neelectric/pythia_trials

Folders and files

Latest commit

History

Repository files navigation

The Pythia Trials

Intro

2-digit Addition

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages