-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge multiple LoRA adapters per request (linear, TIES, DARE) #212
Conversation
Great addition |
Hi @tgaddair, I am Prateek the author of TIES-Merging. This is super great that people find TIES and its follow-ups like DARE useful and thanks for integrating this in the library. I see that you have already added the merging methods to the Docs, however, it would also be great if there is some mention of it on the GitHub Readme so that people know that this feature is added and can read more about it. Let me know what you think about this and if you need help with this. Thanks for your amazing library and work! |
Hey @prateeky2806, thanks for your work on TIES-Merging and for your suggestion here! I agree, mentioning this on the README is a good idea. It's something I was hoping to do as a follow-up, as I wasn't sure what the best place to put it would be, but I should be able to put something together in the next day or two. |
Added #218 to track README updates. |
def merge(self, task_tensors: List[torch.Tensor], weights: torch.Tensor) -> torch.Tensor: | ||
# sparsify | ||
task_tensors = [prune(tensor, self.density, method="magnitude") for tensor in task_tensors] | ||
weighted_task_tensors = _apply_weights(task_tensors, weights) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @prateeky2806! Let me put together a quick PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI you would also need to change the order of the steps in dare_ties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prateeky2806 can you take a look at #239 and verify the fix looks correct? Thanks again for pointing this out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tgaddair, these changes look good, there is just one last thing. It would be ideal to set the default weight for TIES to all ones because that a setting we experimented with in our paper and kind of works well if people do not want to try out different values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have also left a similar comment in the HF PEFT pull request which I think will be fixed in the coming days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @prateeky2806, I believe setting default weights to 1s is handled here: https://github.com/predibase/lorax/blob/main/server/lorax_server/utils/merges/strategies.py#L107
But let me know if I am misunderstanding something. Happy to change if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tgaddair, this looks correct. I missed this when I was going over the code.
Everything looks correct to me now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for verifying @prateeky2806!
Closes #159.
Takes large parts of the implementation from huggingface/peft#1364.
Notably, this approach merges adapters statically during load, rather than during the forward pass. The benefit is that if you make multiple calls to the same merge params, it will be much faster, but potentially slightly slower if the user wants to frequently change merge params each request.
Usage: