Skip to content

This project is just a self-test for trying to recreate the GPT2 model architecture, learned from the makemore series

Notifications You must be signed in to change notification settings

DivyanshK12/gpt2-recreation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Info

  • This project is just a self-test for trying to recreate the GPT2 model architecture
  • Will also try to add a pytorch data loader class (later) instead of the custom Data Loading followed in the initial tutorial
  • Currently not intending to set up a training loop in this project, if I do so, will add the validation split too

Results

Partial Success

  1. Could remember the overall architecture (made a mistake in initial tryout of missing the final layer norm and lm_head)
  2. Divided Attention into MultiHeads and Head in initial try
    • This is okay, but having all heads operate as a singular matrix operation instead of a list is more optimal
    • This also deviates from the structre of the original model
  3. Could not remember the code for the buffer that masks out attention output
  4. Missed out on adding residual connections in Block on first try
    • Not a very bad miss, would have added if I had kept a diagram near me
    • Need to add the normalization due to multiple residual connections on the variance

Status

Fixed all diffs for the basic model

About

This project is just a self-test for trying to recreate the GPT2 model architecture, learned from the makemore series

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages