Skip to content

chunhuizhang/bert_t5_gpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bert_t5_gpt

transformers

  • 重点看下,decoder 部分的 Multi-head attention 其实是 Masked 的,见图中最右侧的下三角矩阵

    • 这也是 GPT(decoder-only)的模型架构所采用的方式
  • post vs. pre LayerNorm

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published