diff --git a/README.md b/README.md index 1e4008f..4481a52 100644 --- a/README.md +++ b/README.md @@ -10,36 +10,50 @@ -### Features +## Features -**Distinguishing the "loralib" and "loratorch" Approaches for Implementation** +- **LoRALib Approach**: This approach involves calculating the computations `xW_0^T` and `x(BA)^T` separately, followed by their summation. This approach is particularly suitable for linear layers and offers accurate computation of LoRA-enhanced layers. -The implementations of "loralib" and "loratorch" exhibit distinct methodologies, particularly when using the example of `nn.Linear`. The underlying mathematical representations are as follows: +- **LoRATorch Approach**: In this approach, the pre-trained weight `W_0` is merged with its LoRA weight `BA`, resulting in the combined weight matrix `(W_0 + \frac{\alpha}{r} BA)`. This approach allows for the straightforward extension of LoRA to more complex and non-linear layers within the PyTorch ecosystem. -1. **LoRa** Approaches +## Mathematical Formulation - The computation is defined as: +1. **LoRALib Approach**: - $h = x W_0^\top + \frac{\alpha}{r} x(BA)^\top,$ + The computation is defined as: + + $\( h = xW_0^T + \frac{\alpha}{r} x(BA)^T \)$ - $where: - - `x` is an input matrix of dimensions \(k \times n\), - - `W_0` is a pre-trained weight matrix of dimensions \(m \times n\), - - `r` is a predefined LoRA rank, - - `B` and `A` are LoRA matrices of dimensions \(m \times r\) and \(r \times n\) respectively, - - `\alpha` is a hyper-parameter.$ + $where: + - \( x \) is the input matrix of dimensions \( k \times n \), + - \( W_0 \) is a pre-trained weight matrix of dimensions \( m \times n \), + - \( r \) is a predefined LoRA rank, + - \( B \) and \( A \) are LoRA matrices of dimensions \( m \times r \) and \( r \times n \) respectively, + - \( \alpha \) is a hyper-parameter.$ +2. **LoRATorch Approach**: -1. For ``loralib``, - $h = x W_0^\top + \frac{\alpha}{r} x(BA)^\top,$ + The computation is defined as: + + $\( h = x(W_0 + \frac{\alpha}{r} BA)^T \)$ + + $where: + - \( x \) is the input matrix of dimensions \( k \times n \), + - \( W_0 \) is a pre-trained weight matrix of dimensions \( m \times n \), + - \( r \) is a predefined LoRA rank, + - \( B \) and \( A \) are LoRA matrices of dimensions \( m \times r \) and \( r \times n \) respectively, + - \( \alpha \) is a hyper-parameter.$ -where $x\in\mathbb{R}^{k\times n}$ is the input matrix, $W_0\in\mathbb{R}^{m\times n}$ is the pre-trained weight matrix, $r$ is the predefined LoRA rank, $B\in\mathbb{R}^{m\times r}$ and $A\in \mathbb{R}^{r\times n}$ are the LoRA matrixes, and $\alpha$ is a hyper-parameter. +## Usage -2. For ``loratorch``, - $h = x (W_0 + \frac{\alpha}{r} BA)^\top.$ - -``loralib`` computes $xW_0^\top$ and $x(BA)^\top$ respectively and then merges the results. -While ``loratorch`` merges pre-trained weight $W_0$ and its LoRA weight $BA$ and then computes the results by simply using ``nn.Linear.forward()``. There is no difference between ``loralib`` and ``loratorch`` in the linear layers. But in some no-linear or complex layers, we are no sure whether this layer satisfies $L(x, W_0)+L(x, BA) = L(x, W_0+BA)$. Hence, it is difficult to extend LoRA to some complex layers by using ``loralib``. On the contrary, the idea of merging weights first in ``loratorch`` is more general and extensible. You just call ``merge_lora_param()`` in ``loratorch`` to merge weights and then call ``forward()`` in the original layer to compute the results. With the help of ``loratorch``, you can easily implement LoRA to any type of layer of ``torch.nn``. +1. **AdapterLoRa Class**: The `AdapterLoRa` class provides a versatile interface for applying LoRA adaptation to neural networks. It supports both `loralib` and `loratorch` approaches, offering the ability to reconstruct and implement LoRA-adapted models. + +2. **Adapting Layers**: The `add_layer_and_Instance_Layer` method allows you to specify the layers you want to adapt using the `layertyep` and `layer` parameters. This method helps tailor the LoRA application to specific layers in your model. + +3. **Freezing Weights**: The `freeze_weights` method enables the option to freeze model weights, enhancing stability and allowing for safer adaptations. + +4. **Reconstructing and Implementing LoRA**: The `reconstruct_model` method applies LoRA adaptation to the model, while the `implement_lora` method further implements LoRA and manages trainable parameters. +. ## Supported Layers diff --git a/core/LayersAdaptes.py b/core/LayersAdaptes.py index deb9b7e..76a16d9 100644 --- a/core/LayersAdaptes.py +++ b/core/LayersAdaptes.py @@ -1,12 +1,23 @@ -import loralib as LoRa -import loratorch as LoRaT import torch.nn as nn +import bitsandbytes as bnb +import loralib as LoRa +import loratorch as LoRaT from typing import Optional -import bitsandbytes as nn - +from .Quantized import AdapterLoRa +LAYERS = AdapterLoRa.layertyep def Layer(model, new_layer): + """ + Copy weights and biases from the original layer to the new layer. + + Args: + model (nn.Module): The original layer. + new_layer (nn.Module): The new layer. + + Returns: + nn.Module: The new layer with copied weights and biases. + """ new_layer.weight = nn.Parameter(model.weight.detach().clone()) if model.bias is not None: @@ -14,77 +25,111 @@ def Layer(model, new_layer): return new_layer -def LoRaLinear(method:str, model:nn.Module, Rank:Optional[int],threshold:Optional[int]): - Adapters = ["LoRa","SandBytes","LoRaTorch"] - if Adapters.__contains__(Adapters) == True: +@Adapters(layertyep) +def LoRaLinear(method: str, model: nn.Module, Rank: Optional[int], threshold: Optional[int]): + """ + Replace a linear layer with a quantized layer using specified method. + + Args: + method (str): The quantization method ("LoRa", "SandBytes", "LoRaTorch"). + model (nn.Module): The input model containing the linear layer. + Rank (Optional[int]): The rank parameter for LoRA adaptation. + threshold (Optional[int]): The threshold parameter for SandBytes adaptation. + + Returns: + nn.Module: The modified model with the quantized layer. + """ + Adapters = ["LoRa", "SandBytes", "LoRaTorch"] + + if method in Adapters: if method == "LoRa": new_layer = LoRa.Linear( - in_features=model.in_features, - out_features=model.out_features, - bias=model.bias is not None, - r=Rank + in_features=model.in_features, + out_features=model.out_features, + bias=model.bias is not None, + r=Rank ) - return Layer(model . new_layer) + return Layer(model, new_layer) if method == "SandBytes": new_layer = bnb.nn.Linear8bitLt( model.in_features, - model.out_featuresm2, - bias=model.bias is not None, - has_fp16_weights=False, - threshold=6.0 - ) - return Layer(model . new_layer) - + model.out_features, + bias=model.bias is not None, + has_fp16_weights=False, + threshold=threshold + ) + return Layer(model, new_layer) - if method == "LoRaTorch": + if method == "LoRaTorch": new_layer = LoRaT.Linear( - in_features=model.in_features, - out_features=model.out_features, - bias=model.bias is not None, - r=Rank - ) - return Layer(model . new_layer) + in_features=model.in_features, + out_features=model.out_features, + bias=model.bias is not None, + r=Rank + ) + return Layer(model, new_layer) else: - raise ValueError(f"there's no method support yet or may you inster invalide name method {method}") - - -def LoRaEmbedding(method:str, - model:nn.Module , - Rank:Optional[int], - lora_alpha:Optional[int], - scale_grad_by_freq:Optional[int], - padding_idx:Optional[int], - max_norm:Optional[int]): - - Adapters = ["LoRa","SandBytes","LoRaTorch"] - if Adapters.__contains__(Adapters) == True: + raise ValueError(f"Unsupported method or invalid method name: {method}") + +@Adapters(layertyep) +def LoRaEmbedding( + method: str, + model: nn.Module, + Rank: Optional[int], + lora_alpha: Optional[int], + scale_grad_by_freq: Optional[int], + padding_idx: Optional[int], + max_norm: Optional[int] +): + """ + Replace an embedding layer with a quantized layer using specified method. + + Args: + method (str): The quantization method ("LoRa", "SandBytes", "LoRaTorch"). + model (nn.Module): The input model containing the embedding layer. + Rank (Optional[int]): The rank parameter for LoRA adaptation. + lora_alpha (Optional[int]): The alpha parameter for LoRA adaptation. + scale_grad_by_freq (Optional[int]): The scale_grad_by_freq parameter for LoRA adaptation. + padding_idx (Optional[int]): The padding_idx parameter for LoRA adaptation. + max_norm (Optional[int]): The max_norm parameter for LoRA adaptation. + + Returns: + nn.Module: The modified model with the quantized layer. + """ + Adapters = ["LoRa", "SandBytes", "LoRaTorch"] + + if method in Adapters: if method == "LoRa": - new_layer = LoRa.Embedding(model.num_embeddings, - model.embedding_dim, - r=Rank, - lora_alpha=lora_alpha, - max_norm=model.max_norm is not None, - scale_grad_by_freq=model.scale_grad_by_freq is not None, - padding_idx=model.padding_idx is not None - ) + new_layer = LoRa.Embedding( + model.num_embeddings, + model.embedding_dim, + r=Rank, + lora_alpha=lora_alpha, + max_norm=model.max_norm is not None, + scale_grad_by_freq=model.scale_grad_by_freq is not None, + padding_idx=model.padding_idx is not None + ) return new_layer if method == "SandBytes": - new_layer= bnb.nn.StableEmbedding(model.num_embeddings, - model.embedding_dim ) + new_layer = bnb.nn.StableEmbedding( + model.num_embeddings, + model.embedding_dim + ) return new_layer if method == "LoRaTorch": - new_layer = LoRaT.Embedding(model.num_embeddings, - model.embedding_dim, - r=Rank, - max_norm=model.max_norm is not None, - scale_grad_by_freq=model.scale_grad_by_freq is not None, - padding_idx=model.padding_idx is not None - ) + new_layer = LoRaT.Embedding( + model.num_embeddings, + model.embedding_dim, + r=Rank, + max_norm=model.max_norm is not None, + scale_grad_by_freq=model.scale_grad_by_freq is not None, + padding_idx=model.padding_idx is not None + ) return new_layer - else: - raise ValueError(f"there's no method support yet or may you inster invalide name method {method}") + else: + raise ValueError(f"Unsupported method or invalid method name: {method}") diff --git a/core/Quantized.py b/core/Quantized.py index dbcfd25..40dc72d 100644 --- a/core/Quantized.py +++ b/core/Quantized.py @@ -1,62 +1,40 @@ import torch.nn as nn from .LayersAdaptes import * -from .Adapter import Adapters from .utils import make_lora_replace +import loralib as lora +import loratorch as loraT class CastOutputToFloat(nn.Module): def forward(self, x): return x.to(torch.float32) class AdapterLoRa(nn.Module): - def __init__(self, model: nn.Module,LoRa=None,BitSand=None, method: str, Rank: int): - """ - AdapterLoRa constructor. - - Args: - model (nn.Module): The input model to which LoRA adaptation will be applied. - method (str): The method to use for LoRA adaptation ("LoRa" or "LoRaTorch"). - Rank (int): The rank parameter for LoRA adaptation. - """ + def __init__(self, model: nn.Module, method: str, Rank: int, *args, **kwargs): super(AdapterLoRa, self).__init__() - - self.Adapters = ["LoRa","SandBytes","LoRaTorch"] + self.Adapters = ["LoRa", "SandBytes", "LoRaTorch"] self.Rank = Rank - self.LORA = LoRa - self.BITSAND = BitSand self.model = model self.Instance_Layer = [] self.layertyep = [] + self.QMODEL = None + self.LORA = None + self.BITSAND = None if method in self.Adapters: - self.method = self.Adapters[method] + self.method = method else: raise ValueError("Invalid method provided") - def add_layer_and_Instance_Layer(self,layertyep:str ,layer: str): - """ - Add a layer to the list of layers to be adapted. + self.extra_args = args + self.extra_kwargs = kwargs - Args: - layer (str): The name of the layer to add. - layerTyep(str): The layer nn.Linear or nn.Embedding to Adjust - Returns: - list: The updated list of layers. - """ + def add_layer_and_Instance_Layer(self, layertyep: str, layer: str): self.Instance_Layer.append(layer) self.layertyep.append(layertyep) - return self.layertyep , self.Instance_Layer + return self.layertyep, self.Instance_Layer def freeze_weights(self, weight_freeze=False): - """ - Freeze model weights. - - Args: - weight_freeze (bool): Flag to freeze model weights. - - Returns: - None - """ for param in self.model.parameters(): param.requires_grad = weight_freeze if param.ndim == 1: @@ -64,40 +42,39 @@ def freeze_weights(self, weight_freeze=False): self.model.gradient_checkpointing_enable() self.model.encoder, self.model.decoder = CastOutputToFloat(), CastOutputToFloat() - - def reconstruct_model(self,verbose=False): - """ - Reconstruct the model using LoRA-adapted layers. - - Returns: - str: A message indicating the success of the reconstruction or an error message. - """ + + def reconstruct_model(self, verbose=False): if not isinstance(self.model, nn.Module): return "Please make sure the model is based on Torch nn.Module" - if self.LORA is not None: - make_lora_replace(self.model, self.lora_layer, self.Rank, self.layer) - return "Model successfully reconstructed with LoRA-adapted layers" - if self.BITSAND is not None: - make_lora_replace(self.model, self.lora_layer, self.Rank, self.layer) - return "Model successfully reconstructed with LoRA-adapted layers" - - - def implement_lora(self,verbose=False): - """ - Implement LoRA adaptation on the model. - - Returns: - nn.Module: The model with LoRA adaptation applied. - """ + self.QMODEL = make_lora_replace( + model=self.model, + method=self.method, + LayerType=self.layertyep, + quantize_fn=LoRaLinear if self.method == "LoRa" else None, + quantize_fn_=LoRaEmbedding if self.method == "LoRa" else None, + Rank=self.Rank, + layers=self.Instance_Layer, + *self.extra_args, + **self.extra_kwargs + ) + return "Model successfully reconstructed with LoRA-adapted layers" + + def implement_lora(self, verbose=False): total_trainable_params_before = sum(p.numel() for p in self.model.parameters() if p.requires_grad) - if verbose == True: + if verbose: print(f"Total trainable parameters before LoRA: {total_trainable_params_before}") - self.LoRa.mark_only_lora_as_trainable(self.model) + if self.method == "LoRa": + self.LORA.mark_only_lora_as_trainable(self.QMODEL) + elif self.method == "LoRaTorch": + loraT.mark_only_lora_as_trainable(self.QMODEL) + elif self.method == "SandBytes": + return self.QMODEL - total_trainable_params_after = sum(p.numel() for p in self.model.parameters() if p.requires_grad) - if verbose == True: - print(f"Total trainable parameters after LoRA: {total_trainable_params_after}") + total_trainable_params_after = sum(p.numel() for p in self.QMODEL.parameters() if p.requires_grad) + + if verbose: + print(f"Total trainable parameters after AdapterLoRA: {total_trainable_params_after}") - return self.model + return self.QMODEL diff --git a/core/utils.py b/core/utils.py index 8e2fa19..82cfdd7 100644 --- a/core/utils.py +++ b/core/utils.py @@ -1,45 +1,108 @@ import torch.nn as nn +from typing import Optional, Callable -def make_lora_replace(model, quantized_fn, Rank, layers, depth=1, path="", verbose=True): +def quantize_layer(method,layer, quantize_fn, quantize_fn_, Rank): """ - Replace specified linear layers in the model with quantized layers using LoRA. + Apply the appropriate quantization function to the given layer. Args: - model (nn.Module): The input model to be modified. - quantized_fn (Callable): The function to quantize a linear layer. + layer (nn.Module): The layer to be quantized. + quantize_fn (Callable): The function to quantize a linear layer. + quantize_fn_ (Callable): The function to quantize an embedding layer. Rank (int): The rank parameter for LoRA adaptation. - layers (list): List of layer names to be adapted. - depth (int): Current depth in recursion (default is 1). - path (str): Current path in model hierarchy (default is empty string). - verbose (bool): Flag to print verbose messages (default is True). + + Returns: + nn.Module: The quantized layer. + """ + if isinstance(layer, nn.Linear) and quantize_fn is not None: + return quantize_fn( + method, + model, + Rank, + threshold + ) + elif isinstance(layer, nn.Embedding) and quantize_fn_ is not None: + return quantize_fn_( + method, + model, + Rank, + lora_alpha, + scale_grad_by_freq, + padding_idx, + max_norm + ) + else: + return layer + +def make_lora_replace( + model, method:str ,LayerType, quantize_fn=None, quantize_fn_=None, Rank=0, layers=None, + depth=1, path="", verbose=True +): + """ + Replace specified linear and embedding layers in the model with quantized layers using LoRA. + + Args: + model (nn.Module): The input model to be modified. + LayerType (str): Type of layers to quantize. "nn.Linear" for linear layers, "nn.Embedding" for embedding layers. + quantize_fn (Callable, optional): The function to quantize a linear layer. + quantize_fn_ (Callable, optional): The function to quantize an embedding layer. + Rank (int, optional): The rank parameter for LoRA adaptation. + layers (list, optional): List of layer names to be adapted. + depth (int, optional): Current depth in recursion (default is 1). + path (str, optional): Current path in model hierarchy (default is empty string). + verbose (bool, optional): Flag to print verbose messages (default is True). Returns: nn.Module: The modified model with specified layers quantized using LoRA. """ + AdaptersLayer = ["nn.Linear", "nn.Embedding"] + if depth > 10: return model - if isinstance(model, nn.Linear) and any(item in path for item in layers): + if LayerType[0] in AdaptersLayer and isinstance(model, nn.Linear) and any(item in path for item in layers): if verbose: print(f"Found linear layer to quantize: {path}", type(model)) - return quantized_fn(model, Rank) + if quantize_fn is not None: + return quantize_fn( + method, + model, + Rank, + threshold + ) + + if LayerType[1] in AdaptersLayer and isinstance(model, nn.Embedding) and any(item in path for item in layers): + if verbose: + print(f"Found embedding layer to quantize: {path}", type(model)) + if quantize_fn_ is not None: + return quantize_fn_( + method, + model, + Rank, + lora_alpha, + scale_grad_by_freq, + padding_idx, + max_norm + ) for key, module in model.named_children(): - if isinstance(module, nn.Linear) and any(item in path for item in layers): - layer = quantized_fn(module, Rank) - setattr(model, key, layer) + if isinstance(module, (nn.Linear, nn.Embedding)) and any(item in path for item in layers): + quantized_layer = quantize_layer(module, quantize_fn, quantize_fn_, Rank) + setattr(model, key, quantized_layer) if verbose: - print(f"Found linear layer to quantize: {path}:{key}", type(module)) + print(f"Found linear or embedding layer to quantize: {path}:{key}", type(module)) elif isinstance(module, (nn.ModuleList, nn.ModuleDict)): for i, elem in enumerate(module): layer = make_lora_replace( - elem, quantized_fn, Rank, layers, depth + 1, f"{path}:{key}[{i}]", verbose=verbose + elem, LayerType, quantize_fn, quantize_fn_, Rank, layers, + depth + 1, f"{path}:{key}[{i}]", verbose=verbose ) if layer is not None: module[i] = layer else: layer = make_lora_replace( - module, quantized_fn, Rank, layers, depth + 1, f"{path}:{key}", verbose=verbose + module, LayerType, quantize_fn, quantize_fn_, Rank, layers, + depth + 1, f"{path}:{key}", verbose=verbose ) if layer is not None: setattr(model, key, layer)