-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why LayerNorm layers are frozen? #1
Comments
Hi @iliaschalkidis, thank you for the correction. I agree with you. As I remember, fixing the LayerNorm layers' parameters yielded a better GLUE performance score in my experiments. |
Great, thanks! I would also recommend you to let the I also was wondering if the "With the skip-connection, if the parameters of the projection layers are initialized to near-zero, the module is initialized to an approximate identity function." In the official implementation the Thanks for your implementation! |
Thanks for the detailed comments. @iliaschalkidis It is worth mentioning that after calling the |
Sorry @hosein-m, one last question: Does this mean that you use the very same |
Sorry @iliaschalkidis for closing the issue! According to Houlsby’s architecture, there must be two Adapter modules in each Transformer layer: one in the Please let me know if I miss something :) |
Yeah, I show this line in the article and suspected this is your motivation. They should have phrased this better and clearer, like "The two adapter layers are tied." or something similar. I cannot validate this in the original implementation. Thanks again! |
Hi @hosein-m,
I read your code and the paper (https://arxiv.org/pdf/1902.00751.pdf). According to the paper, the LayerNorm layers shall be trainable. Am I missing something?
https://github.com/hosein-m/TF-Adapter-BERT/blob/8ddad140dc8c61b5db4db50d47fc258b0e9868cb/run_tf_glue_adapter_bert.py#L110
The text was updated successfully, but these errors were encountered: