Skip to content

Spork on puzzles for exploring transformers because a spork is better than a fork any time of the day

License

Notifications You must be signed in to change notification settings

hapatika/Transformer-Puzzles

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer Puzzles

Open In Colab

This notebook is a collection of short coding puzzles based on the internals of the Transformer. The puzzles are written in Python and can be done in this notebook. After completing these you will have a much better intutive sense of how a Transformer can compute certain logical operations.

These puzzles are based on Thinking Like Transformers by Gail Weiss, Yoav Goldberg, Eran Yahav and derived from this blog post.

image

Goal

Can we produce a Transformer that does basic elementary school addition?

i.e. given a string "19492+23919" can we produce the correct output?

Rules

Each exercise consists of a function with a argument seq and output seq. Like a transformer we cannot change length. Operations need to act on the entire sequence in parallel. There is a global indices which tells use the position in the sequence. If we want to do something different on certain positions we can use where like in Numpy or PyTorch. To run the seq we need to give it an initial input.

def even_vals(seq=tokens):
    "Keep even positions, set odd positions to -1"
    x = indices % 2
    # Note that all operations broadcast so you can use scalars.
    return where(x == 0, seq, -1)
seq = even_vals()

# Give the initial input tokens
seq.input([0,1,2,3,4])

The main operation you can use is "attention". You do this by defining a selector which forms a matrix based on key and query.

before = key(indices) < query(indices)
before

We can combine selectors with logical operations.

before_or_same = before | (key(indices) == query(indices))
before_or_same

Once you have a selector, you can apply "attention" to sum over the grey positions. For example to compute cumulative such we run the following function.

def cumsum(seq=tokens):
    return before_or_same.value(seq)
seq = cumsum()
seq.input([0, 1, 2, 3, 4])

Good luck!

About

Spork on puzzles for exploring transformers because a spork is better than a fork any time of the day

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%