Analyzing layer norm compositions #15

nicholasturner1 · 2022-05-14T14:57:25Z

On initial analysis, the layer norm biases seem to have a large portion of the largest terms.
https://github.com/nicholasturner1/gpt-omics/blob/main/notebooks/220509_initial_gptj_run.ipynb
These terms also reach for surprisingly long distances (longer than the attention heads). Could be something interesting here.

nicholasturner1 · 2022-07-04T13:40:56Z

Last week, we realized that the bias terms added to the residual stream by other layers are always passed through a layernorm (with its own bias) before reaching its destination within another layer. Given that, we might instead analyze a fused bias term between each layer instead of splitting them apart.

This is probably a job for another paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyzing layer norm compositions #15

Analyzing layer norm compositions #15

nicholasturner1 commented May 14, 2022

nicholasturner1 commented Jul 4, 2022

Analyzing layer norm compositions #15

Analyzing layer norm compositions #15

Comments

nicholasturner1 commented May 14, 2022

nicholasturner1 commented Jul 4, 2022