Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Would it be possible to adopt TransformerLens on models with a different layernorm implementation? #773

Open
Steven-Yiran opened this issue Nov 8, 2024 · 2 comments
Assignees
Labels
complexity-high Very complicated changes for people to address who are quite familiar with the code question Further information is requested

Comments

@Steven-Yiran
Copy link

Question

I am looking to adopt TransformerLens on a custom model currently not supported the TransformerLens library. The custom model have the same GPT-2 like architecture except the implementation of the LayerNorm operation. Specifically, for each layer it implements a LayerNorm (with weight and bias) at the end of the mlp output. I looked into the Othello GPT but am still not sure about how to avoid the architecture mismatch.

Would it still be possible to run analysis on the custom model with TransformerLens? Thanks!

@bryce13950
Copy link
Collaborator

Today this is not possible without making modifications to the code itself. Making this possible is something that is tentatively on the plans for what will be 4.0. For the time being, I can setup a little hook for you to override the layer norm, but it would be an experimental branch, and we would probably have to work relatively closely together to make sure it is working for you. The model you are trying to test is most similar to GPT-2 right?

@bryce13950 bryce13950 added question Further information is requested complexity-high Very complicated changes for people to address who are quite familiar with the code labels Nov 12, 2024
@Steven-Yiran
Copy link
Author

Thanks for your response! Specifically, I am trying to run experiments on BioGPT. In terms of architecture, the only layer norm occurs after the mlp modules (final_layer_norm in the screenshot below). The implementation of attention and mlp modules are the same with GPT-2.
layer norm

I would really love to work with you on this if you think this is something that falls on the general roadmap!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity-high Very complicated changes for people to address who are quite familiar with the code question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants