LoRA tuning embedding layer uses nn.Parameter instead of nn.Linear #2040
-
I saw that when implementing the LoRA module for Linear layers, the code here uses self.lora_dropout.update(nn.ModuleDict({adapter_name: lora_dropout_layer}))
# Actual trainable parameters
self.lora_A[adapter_name] = nn.Linear(self.in_features, r, bias=False)
self.lora_B[adapter_name] = nn.Linear(r, self.out_features, bias=False) However when implementing the LoRA module for Embedding layers, the code here uses # Actual trainable parameters
weight_A = torch.randn((r, self.in_features))
weight_B = torch.randn((self.out_features, r))
self.lora_embedding_A[adapter_name] = nn.Parameter(weight_A)
self.lora_embedding_B[adapter_name] = nn.Parameter(weight_B) I'd love to know is there any reason we can't use |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Very good question. Your observation is correct and I'm pretty sure that the only reason is that this is how the reference implementation by Microsoft did it: (I wasn't involved with PEFT back then so I cannot know for sure) Personally, I think it would be better if the embedding implementation were the same as the linear implementation and I also don't like that we have Making these changes now would require very careful consideration to avoid breaking existing code, but I think it should be possible to do in a backwards compatible way (though not forward compatible). In my mind, this just hasn't been very high priority. Is there a specific problem you encountered due to this implementation detail or did you just ask out of curiosity? |
Beta Was this translation helpful? Give feedback.
Very good question. Your observation is correct and I'm pretty sure that the only reason is that this is how the reference implementation by Microsoft did it:
https://github.com/microsoft/LoRA/blob/4c0333854cb905966f8cc4e9a74068c1e507c7b7/loralib/layers.py#L32
(I wasn't involved with PEFT back then so I cannot know for sure)
Personally, I think it would be better if the embedding implementation were the same as the linear implementation and I also don't like that we have
self.lora_A/B
andself.lora_embedding_A/B
, as only ever one of the two is used, meaning we could get rid ofself.lora_embedding_A/B
.Making these changes now would require very careful consideration to avoid breaking exis…