LoRA tuning embedding layer uses nn.Parameter instead of nn.Linear #2040

Wangmerlyn · 2024-08-27T11:39:07Z

Wangmerlyn
Aug 27, 2024

I saw that when implementing the LoRA module for Linear layers, the code here uses nn.Linear to implement these 2 LoRA weights.
LoRA for Linear layer

self.lora_dropout.update(nn.ModuleDict({adapter_name: lora_dropout_layer}))
# Actual trainable parameters
self.lora_A[adapter_name] = nn.Linear(self.in_features, r, bias=False)
self.lora_B[adapter_name] = nn.Linear(r, self.out_features, bias=False)

However when implementing the LoRA module for Embedding layers, the code here uses nn.Parameter() and implemented matrix multiplication in the forward part.
LoRA for Embedding layer
LoRA Embedding layer forward

# Actual trainable parameters
weight_A = torch.randn((r, self.in_features))
weight_B = torch.randn((self.out_features, r))
self.lora_embedding_A[adapter_name] = nn.Parameter(weight_A)
self.lora_embedding_B[adapter_name] = nn.Parameter(weight_B)

I'd love to know is there any reason we can't use nn.Linear() to implement these LoRA weights here?

Answered by BenjaminBossan

Aug 27, 2024

Very good question. Your observation is correct and I'm pretty sure that the only reason is that this is how the reference implementation by Microsoft did it:

https://github.com/microsoft/LoRA/blob/4c0333854cb905966f8cc4e9a74068c1e507c7b7/loralib/layers.py#L32

(I wasn't involved with PEFT back then so I cannot know for sure)

Personally, I think it would be better if the embedding implementation were the same as the linear implementation and I also don't like that we have self.lora_A/B and self.lora_embedding_A/B, as only ever one of the two is used, meaning we could get rid of self.lora_embedding_A/B.

Making these changes now would require very careful consideration to avoid breaking exis…

View full answer

BenjaminBossan · 2024-08-27T11:57:04Z

BenjaminBossan
Aug 27, 2024
Maintainer

Very good question. Your observation is correct and I'm pretty sure that the only reason is that this is how the reference implementation by Microsoft did it:

https://github.com/microsoft/LoRA/blob/4c0333854cb905966f8cc4e9a74068c1e507c7b7/loralib/layers.py#L32

(I wasn't involved with PEFT back then so I cannot know for sure)

Personally, I think it would be better if the embedding implementation were the same as the linear implementation and I also don't like that we have self.lora_A/B and self.lora_embedding_A/B, as only ever one of the two is used, meaning we could get rid of self.lora_embedding_A/B.

Making these changes now would require very careful consideration to avoid breaking existing code, but I think it should be possible to do in a backwards compatible way (though not forward compatible). In my mind, this just hasn't been very high priority.

Is there a specific problem you encountered due to this implementation detail or did you just ask out of curiosity?

4 replies

Wangmerlyn Aug 27, 2024
Author

Thank you so much for the reply. I recently ran into a problem related with this embedding implementation. It seems the only way to work around this for me was to change nn.Parameters to nn.Linear for the LoRA modules.
I'll try if I can change the implementation methods and not breaking existing code.

BenjaminBossan Aug 27, 2024
Maintainer

If you can share the issue you ran across, I may be able to help.

I'll try if I can change the implementation methods and not breaking existing code.

Let me know what you find out.

Wangmerlyn Aug 27, 2024
Author

The issue is actually related to a internal version of our own distributed training system, so sorry since it's not so convenient to share.

Again thank you so much.

BenjaminBossan Aug 27, 2024
Maintainer

Makes sense. I think in general, it should not be too hard to rewrite lora.Embedding as discussed as long as backwards compatibility is not a concern.

Btw. in case you don't know, we have a mechanism in PEFT to dispatch to custom layers: https://huggingface.co/docs/peft/v0.12.0/en/developer_guides/custom_models#unsupported-module-types. You can define your own Embedding class and shadow the lora.Embedding class using this mechanism.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoRA tuning embedding layer uses nn.Parameter instead of nn.Linear #2040

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

LoRA tuning embedding layer uses nn.Parameter instead of nn.Linear #2040

Wangmerlyn Aug 27, 2024

Replies: 1 comment · 4 replies

BenjaminBossan Aug 27, 2024 Maintainer

Wangmerlyn Aug 27, 2024 Author

BenjaminBossan Aug 27, 2024 Maintainer

Wangmerlyn Aug 27, 2024 Author

BenjaminBossan Aug 27, 2024 Maintainer

Wangmerlyn
Aug 27, 2024

Replies: 1 comment 4 replies

BenjaminBossan
Aug 27, 2024
Maintainer

Wangmerlyn Aug 27, 2024
Author

BenjaminBossan Aug 27, 2024
Maintainer

Wangmerlyn Aug 27, 2024
Author

BenjaminBossan Aug 27, 2024
Maintainer