[Question] Would it be possible to adopt TransformerLens on models with a different layernorm implementation? #773

Steven-Yiran · 2024-11-08T00:51:29Z

Question

I am looking to adopt TransformerLens on a custom model currently not supported the TransformerLens library. The custom model have the same GPT-2 like architecture except the implementation of the LayerNorm operation. Specifically, for each layer it implements a LayerNorm (with weight and bias) at the end of the mlp output. I looked into the Othello GPT but am still not sure about how to avoid the architecture mismatch.

Would it still be possible to run analysis on the custom model with TransformerLens? Thanks!

bryce13950 · 2024-11-12T22:30:10Z

Today this is not possible without making modifications to the code itself. Making this possible is something that is tentatively on the plans for what will be 4.0. For the time being, I can setup a little hook for you to override the layer norm, but it would be an experimental branch, and we would probably have to work relatively closely together to make sure it is working for you. The model you are trying to test is most similar to GPT-2 right?

Steven-Yiran · 2024-11-12T23:43:49Z

Thanks for your response! Specifically, I am trying to run experiments on BioGPT. In terms of architecture, the only layer norm occurs after the mlp modules (final_layer_norm in the screenshot below). The implementation of attention and mlp modules are the same with GPT-2.

I would really love to work with you on this if you think this is something that falls on the general roadmap!

bryce13950 assigned Steven-Yiran Nov 12, 2024

bryce13950 added question Further information is requested complexity-high Very complicated changes for people to address who are quite familiar with the code labels Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Would it be possible to adopt TransformerLens on models with a different layernorm implementation? #773

[Question] Would it be possible to adopt TransformerLens on models with a different layernorm implementation? #773

Steven-Yiran commented Nov 8, 2024

bryce13950 commented Nov 12, 2024

Steven-Yiran commented Nov 12, 2024

[Question] Would it be possible to adopt TransformerLens on models with a different layernorm implementation? #773

[Question] Would it be possible to adopt TransformerLens on models with a different layernorm implementation? #773

Comments

Steven-Yiran commented Nov 8, 2024

Question

bryce13950 commented Nov 12, 2024

Steven-Yiran commented Nov 12, 2024