Weight sharing compatibility #13

sidnb13 · 2023-06-20T14:25:35Z

In the Transformer, a weight sharing scheme between the input embedding and output projection layer is used to improve efficiency. Any reasons why this is not implemented, and how it could be done?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weight sharing compatibility #13

Weight sharing compatibility #13

sidnb13 commented Jun 20, 2023

Weight sharing compatibility #13

Weight sharing compatibility #13

Comments

sidnb13 commented Jun 20, 2023