diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..e69de29b diff --git a/404.html b/404.html new file mode 100644 index 00000000..ac91b2df --- /dev/null +++ b/404.html @@ -0,0 +1,1147 @@ + + + +
+ + + + + + + + + + + + + + +In the complex landscape of multi-task learning, AdaMerging has emerged as a potent method for adaptively merging model parameters to optimize performance across tasks. Unlike traditional fixed-coefficient methods, AdaMerging autonomously learns merging coefficients, offering a more refined and responsive approach1.
+The cornerstone of AdaMerging lies in its adaptive nature, where it learns the coefficients for merging either on a task-wise or layer-wise basis. This adaptability is driven by an entropy minimization strategy applied to unlabeled test samples as a surrogate objective function, which serves to refine the merging coefficients for optimal performance.
+Task-wise AdaMerging is formulated as:
+where \(\lambda_i\) represents the merging coefficient for the \(i\)-th task, and \(\tau_i\) denotes the task vector for the \(i\)-th task.
+On the other hand, Layer-wise AdaMerging is articulated as:
+where the merging coefficient \(\lambda^{l}_{i}\) and task vector \(\tau^{l}_{i}\) are specific to each layer \(l\) of the model.
+By leveraging this adaptive learning approach, AdaMerging significantly enhances the model's ability to generalize across tasks and layers, resulting in a more robust and finely-tuned performance profile. The method’s reliance on entropy minimization ensures that the merging process continually seeks the most informative and stable configuration, adapting to the specific needs of the dataset and tasks at hand.
+(ICLR 2024) AdaMerging: Adaptive Model Merging for Multi-Task Learning. https://openreview.net/pdf?id=nZP6NgD3QY ↩
+The DepthUpscalingAlgorithm
is used to upscale the depth of PyTorch models. Here's a basic guide on how to use it:
First, import the necessary modules:
+from omegaconf import DictConfig
+from torch import nn
+from fusion_bench.method import DepthUpscalingAlgorithm
+from fusion_bench.modelpool import to_modelpool
+
Create an instance of DepthUpscalingAlgorithm
by passing a configuration dictionary.
+This dictionary should contain the name of the method ("depth_upscaling") and a list of layer indices that determine the upscaling pattern.
method_config = {"name": "depth_upscaling", "layer_indices": [0, 1, 1, 0]}
+algorithm = DepthUpscalingAlgorithm(DictConfig(method_config))
+
Assume we have a list of PyTorch models (nn.ModuleList
instances) that we want to upscale. Here, we're creating a list of linear models as an example:
Then, we can the model to the run
method of our algorithm:
The run
method will return an upscaled model. The type of the returned model will be the same as the input models (in this case, nn.ModuleList
), and its length will be determined by the layer indices specified in the method configuration.
Here we provide an example of how to use the DepthUpscalingAlgorithm
to upscale the depth of a Mistral model 1.
from omegaconf import DictConfig
+from torch import nn
+from transformers import AutoModelForCausalLM, MistralConfig, MistralForCausalLM
+from fusion_bench.method import DepthUpscalingAlgorithm
+
+# create a Mistral model
+# here we randomly initialize the model for demonstration purposes
+# in practice, you would load a pretrained model
+model_config = MistralConfig(
+ # https://huggingface.co/mistralai/Mistral-7B-v0.1/resolve/main/config.json
+ **{
+ "architectures": ["MistralForCausalLM"],
+ "bos_token_id": 1,
+ "eos_token_id": 2,
+ "hidden_act": "silu",
+ "hidden_size": 4096,
+ "initializer_range": 0.02,
+ "intermediate_size": 14336,
+ "max_position_embeddings": 32768,
+ "model_type": "mistral",
+ "num_attention_heads": 32,
+ "num_hidden_layers": 32,
+ "num_key_value_heads": 8,
+ "rms_norm_eps": 1e-05,
+ "rope_theta": 10000.0,
+ "sliding_window": 4096,
+ "tie_word_embeddings": False,
+ "torch_dtype": "bfloat16",
+ "transformers_version": "4.34.0.dev0",
+ "use_cache": True,
+ "vocab_size": 32000,
+ }
+)
+print('creating model')
+model: MistralForCausalLM = AutoModelForCausalLM.from_config(model_config)
+
+method_config = {
+ "name": "depth_upscaling",
+ "layer_indices": ["range(0,24)", "range(8,32)"],
+}
+algorithm = DepthUpscalingAlgorithm(DictConfig(method_config))
+print('upscaling model')
+upscaled_model = algorithm.run(model.model.layers)
+
+# substitute the model with the upscaled model
+model.model.layers = upscaled_model
+
The DepthUpscalingAlgorithm
is integrated into the fusion_bench
package. You can use it by specifying "depth_upscaling"
as the method name in the command line or configuration file.
name: depth_upscaling
+# this should be a list of integers or string, indicating the sequence of layers. If the entry is an integer, it will use the n-th layer of the model. If the entry is a string, it will use the layers specified by the string. The string should be a valid python expression that evaluates to a list of integers.
+# for example, ["range(0,12)", "range(6,12)"] will use the first 12 layers and the last 6 layers of the model to construct the new model
+# [0, 2, 4, "range(6,12)"] will use the 1st, 3rd, 5th, and the 7th to 12th layers of the model to construct the new model
+layer_indices: null
+
You can then run the fusion_bench
command with the specified configuration file:
DepthUpscalingAlgorithm
+
+
+¶
+ Bases: ModelFusionAlgorithm
fusion_bench/method/depth_upscaling.py
run(modelpool)
+
+¶Executes the depth upscaling algorithm on a given model pool.
+This method checks the type of the model pool, ensures that it contains only one model, and verifies that the model is an instance of nn.ModuleList
.
Parameters:
+modelpool
+ (ModuleList | ModelPool
)
+ –
+ The pool of models to upscale. Must contain only one model.
+Returns:
+ModuleList
+ –
+ nn.ModuleList: The upscaled model.
+Raises:
+AssertionError
+ –
+ If the model pool contains more than one model or if the model is not an instance of nn.ModuleList
.
ValueError
+ –
+ If an invalid layer specification is provided in the configuration.
+fusion_bench/method/depth_upscaling.py
The Dummy Algorithm is a simple algorithm that does not perform any fusion operation. Instead, it returns a pretrained model if one is available in the model pool. If no pretrained model is available, it returns the first model in the model pool. +This algorithm is useful for testing and debugging purposes, as it allows you to quickly check if the model pool is set up correctly and the fusion process is working as expected.
+To use the Dummy Algorithm, you need to specify "dummy"
as the algorithm name.
The implementation of the Dummy Algorithm is straightforward. Here is the main method of the DummyAlgorithm
class:
DummyAlgorithm
+
+
+¶
+ Bases: ModelFusionAlgorithm
fusion_bench/method/dummy.py
The Fusion Algorithm
module is a core component of the FusionBench project, dedicated to the implementation and execution of various model fusion techniques.
+This module provides the mechanisms necessary to combine multiple models from the Model Pool, enabling nuanced and optimized model merging operations.
Fusion Algorithm
Module¶The module is typically invoked through a configuration-driven approach in CLI scripts, enabling users to specify fusion algorithms and parameters via YAML configuration files. This method ensures reproducibility and ease of use. +For more information, see the document of fusion_bench CLI.
+ModelFusionAlgorithm
is the base class for all fusion algorithms in the Fusion Algorithm module.
+It provides a common interface for different fusion techniques, allowing for seamless integration and execution of various algorithms.
ModelFusionAlgorithm
+
+
+¶
+ Bases: ABC
fusion_bench/method/base_algorithm.py
run(modelpool)
+
+
+ abstractmethod
+
+
+¶Fuse the models in the given model pool.
+ + +Examples:
+>>> algorithm = SimpleAverageAlgorithm()
+>>> modelpool = ModelPool()
+>>> merged_model = algorithm.fuse(modelpool)
+
Parameters:
+modelpool
+ (_type_
)
+ –
+ description
+fusion_bench/method/base_algorithm.py
from ..method import load_algorithm_from_config
+from ..modelpool import load_modelpool_from_config
+
+def run_model_fusion(cfg: DictConfig):
+ modelpool = load_modelpool_from_config(cfg.modelpool)
+ algorithm = load_algorithm_from_config(cfg.method)
+ merged_model = algorithm.run(modelpool)
+
+ if hasattr(cfg, "taskpool") and cfg.taskpool is not None:
+ taskpool = load_taskpool_from_config(cfg.taskpool)
+ taskpool.evaluate(merged_model)
+ else:
+ print("No task pool specified. Skipping evaluation.")
+
In summary, the Fusion Algorithm module is vital for the model merging operations within FusionBench, leveraging sophisticated techniques to ensure optimal fusion and performance evaluation of deep learning models. This capability makes it an indispensable tool for researchers and practitioners focusing on model fusion strategies.
+ + + + + + + + + + + + + +The max-model predictor algorithm is a type of ensemble method. +Formally, a max-model predictor is defined as follows:
+Definition (Max-Model Predictor) 1 +Given a set of predictors \(H = \{h_1, h_2, \ldots, h_n\}\), with \(h_i: \mathcal{X} \times \mathcal{Y}_i \mapsto \mathbb{R}\), the max-model predictor \(h_H\) is defined as:
+Take the flu detection problem as an example 1. +Doctors want to build a learning model to detect what type of virus one patient is affected based on her symptoms, for appropriate treatment. However, the types of influenza diverse geographically (Rejmanek et al., 2015), which means the distribution of patient records collected by a hospital in California may be different from those in Florida. In an extreme case, some types are unknown to the other hospital. Assume there are 4 types of influenza in the United States. In California, 2 of 4 are commonly detected, while in Florida 3 of 4 types are often detected. We assume in the two states, doctors separately trained two models \(h_{CA}\) and \(h_{FL}\) which work locally well in California and Florida respectively. However, a direct ensemble of the two local models may not work well on all the patients. Let \(h_{US}\) denote the ideal global model trained on the combination of local datasets. When we input a patient record \(x\), each model outputs its prediction as shown in the following table:
+Table: Example of flu detection on a patient \(x\) affected with type 2 flu. “−” means this model is not able to predict the corresponding class. Taking the maximal score as prediction, \(h_{FL}\) is consistent with \(h_{US}\), but the combination of two local models \(h_{CA,FL}\) is not since \(3/4 > 4/7\).
+Type | +1 | +2 | +3 | +4 | +
---|---|---|---|---|
\(h_{US}(x)\) | +2/10 | +4/10 | +1/10 | +3/10 | +
\(h_{CA}(x)\) | +- | +- | +1/4 | +3/4 | +
\(h_{FL}(x)\) | +2/7 | +4/7 | +1/7 | +- | +
\(h_{\{CA,FL\}}(x)\) | +2/7 | +4/7 | +1/4 | +3/4 | +
Here is an example of how to use the Max-Model Predictor Algorithm:
+from fusion_bench.method import MaxModelPredictorAlgorithm
+from fusion_bench.modelpool import ModelPool
+
+# Instantiate the MaxPredictorAlgorithm
+algorithm = MaxModelPredictorAlgorithm()
+
+# Assume we have a ModelPool instance that contains the models we want to ensemble.
+modelpool = ModelPool(...) # or a list of nn.Module
+
+# Run the algorithm on the model pool.
+max_model_predictor : nn.Module = algorithm.run(modelpool)
+
Configuration template for the Max Predictor Algorithm:
+ +To create a max predictor ensemble of models for a specific task, you can use the following command:
+ + + + + + + + + + + + + + + +ModelRecombinationAlgorithm
is a class used to recombine models in a model pool. Here's how to use it:
First, import the necessary modules:
+from fusion_bench.method import ModelRecombinationAlgorithm
+from fusion_bench.modelpool import ModelPool, to_modelpool
+from torch import nn
+
Create an instance of ModelRecombinationAlgorithm
:
Create a model pool using the to_modelpool
function. This function takes a list of models or a dict of models and converts it into a ModelPool
:
Use the run
method of the ModelRecombinationAlgorithm
instance to recombine the models in the model pool:
The run
method takes two arguments:
modelpool
: The model pool to recombine.return_modelpool
(optional): A boolean indicating whether to return the entire model pool or just the first model. Defaults to True
.If return_modelpool
is True
, the run
method returns a new ModelPool
with the recombined models. If False
, it returns the first model from the new model pool.
You can check the type of the returned value to ensure that the run
method worked correctly:
Configuration template for the model recombination algorithm:
+name: model_recombination
+# if `return_model_pool` is not null, the argument `return_modelpool` passed to the `run` method will be ignored.
+return_modelpool: null
+
Construct a model recombination using our CLI tool fusion_bench
:
fusion_bench \
+ method=model_recombination \
+ method.return_modelpool=false \
+ modelpool=... \
+ taskpool=...
+
ModelRecombinationAlgorithm
+
+
+¶
+ Bases: ModelFusionAlgorithm
Model recombination recombinates the layers of the given models, to create a new set of models.
+ +fusion_bench/method/model_recombination.py
run(modelpool, return_modelpool=True)
+
+¶Executes the model recombination algorithm on a given model pool.
+This method loads models from the model pool, determines their type, and applies the appropriate recombination method.
+It then creates a new model pool with the recombined models. Depending on the return_modelpool
flag, it either returns
+the entire new model pool or just the first model from it.
nn.ModuleList
, the recombination method recombine_modellist
is used. Where each module in the list is shuffled across the models.nn.ModuleDict
, the recombination method recombine_modeldict
is used. Where each module in the dictionary is shuffled across the models.nn.Module
, the recombination method recombine_state_dict
is used. Where the state dictionaries of the models are shuffled across the models.Parameters:
+modelpool
+ (ModelPool
)
+ –
+ The pool of models to recombine.
+return_modelpool
+ (bool
, default:
+ True
+)
+ –
+ Flag indicating whether to return the entire model pool or just the first model. Defaults to True. If this algorithm is initialized with config, the value of return_modelpool
in the config will be used and this argument passed to the method will be ignored.
Returns:
+Union[Module, ModelPool]
+ –
+ Union[nn.Module, ModelPool]: The recombined model pool or the first model from the recombined pool, depending on the return_modelpool
flag.
Raises:
+ValueError
+ –
+ If the models in the model pool are of an unsupported type.
+fusion_bench/method/model_recombination.py
recombine_modellist(models)
+
+¶fusion_bench/method/model_recombination.py
recombine_modeldict(models)
+
+¶fusion_bench/method/model_recombination.py
recombine_state_dict(models)
+
+¶fusion_bench/method/model_recombination.py
Here we provides instructions on how to use the fusion_bench
command-line interface to merge models using a Mixture of Experts (MoE) approach.
The first code block is a YAML configuration file for the merging method. The name
field specifies the name of the merging method. The num_experts
field specifies the number of experts to use in the merging process. The experts_per_token
field specifies the number of experts to use per token. The save_checkpoint
field specifies the path where the merged model will be saved.
name: mixtral_for_causal_lm_moe_merging
+
+experts_per_token: 2
+# path to save the merged model, if provided
+save_checkpoint: null
+
The second code block is another YAML configuration file, this time for the model pool. The type
field specifies the type of model pool to use. The models
field is a list of models to include in the pool. Each model should have a name
and a path
, and the model is loaded from the path.
type: AutoModelForCausalLMPool
+# each model should have a name and a path, and the model is loaded from the path
+# this is equivalent to `AutoModelForCausalLM.from_pretrained(path)`
+models:
+ - name: _pretrained_
+ path: path_to_your_pretrained_model
+ - name: expert_1
+ path: path_to_your_expert_model_1
+ - name: expert_2
+ path: path_to_your_expert_model_2
+ - name: expert_3
+ path: path_to_your_expert_model_3
+ - name: expert_4
+ path: path_to_your_expert_model_4
+
Finally, the third code block is a bash command that runs the fusion_bench
command-line interface with the specified method, model pool, and task pool. The method
argument specifies the merging method to use. The modelpool
argument specifies the model pool to use. The modelpool.models.0.path
argument specifies the path to the pretrained model to use. The taskpool
argument specifies the task pool to use. In this case, a dummy task pool is used that does nothing but print the parameter counts of the merged model.
fusion_bench \
+ method=mixtral_moe_merging \
+ modelpool=mixtral_moe_merging \
+ taskpool=dummy # this is a dummy taskpool that does nothing but print the parameter counts of the merged model
+
This guide provides a step-by-step process for merging models using the fusion_bench
command-line interface. By following these instructions, you can merge your own models and save them for future use.
mixtral_merging
+
+
+¶
MixtralForCausalLMMergingAlgorithm
+
+
+¶
+ Bases: MixtralForCausalLMUpscalingAlgorithm
fusion_bench/method/mixture_of_experts/mixtral_merging.py
run(modelpool)
+
+¶Runs the merging process. It first upscales the models to MixtralForCausalLM, +then substitutes the experts of the MixtralForCausalLM with the models from the modelpool.
+ + +Parameters:
+modelpool
+ (ModelPool
)
+ –
+ The pool of models to be merged. Each model in the pool will be treated as an expert, and should be a MistralForCausalLM
or LlamaForCausalLM
.
Returns:
+MixtralForCausalLM
( MixtralForCausalLM
+) –
+ The merged model.
+fusion_bench/method/mixture_of_experts/mixtral_merging.py
MixtralMoEMergingAlgorithm
+
+
+¶
+ Bases: MixtralUpscalingAlgorithm
This class is responsible for merging models into a MixtralModel.
+ +fusion_bench/method/mixture_of_experts/mixtral_merging.py
run(modelpool)
+
+¶Runs the merging process.
+ + +Parameters:
+modelpool
+ (ModelPool
)
+ –
+ The pool of models to be merged. Each model in the pool will be treated as an expert, and should be a MistralModel
or LlamaModel
.
Returns:
+MixtralModel
( MixtralModel
+) –
+ The merged model.
+fusion_bench/method/mixture_of_experts/mixtral_merging.py
Sparse upcycling is a technique used to initialize a sparsely activated Mixture-of-Experts (MoE) model from a dense checkpoint. This approach leverages previously incurred training costs to improve the performance of large models while reducing the computational expense. In the process, dense Transformer blocks are partially replaced with MoE blocks, where the MLPs in a Transformer block are replaced by multiple experts. The experts are chosen based on routing probabilities determined by a router. The initialized MoE model is then further trained to recover the performance. This method results in improved performance for both language and vision models while using only a fraction of the original dense pretraining cost 1.
+Here’s an example demonstrating how to upscale a pre-trained Mistral model to a Mixtral model:
+import os
+
+from omegaconf import DictConfig
+from transformers import MistralForCausalLM
+
+from fusion_bench.method.mixture_of_experts.mixtral_upcycling import (
+ MixtralForCausalLMUpscalingAlgorithm,
+)
+from fusion_bench.utils import print_parameters
+
+# Load a pre-trained Mistral model
+pretrained_model = MistralForCausalLM.from_pretrained(
+ os.path.expanduser("path_to_mistral_model")
+)
+print("Pretrained model:")
+print_parameters(pretrained_model)
+# Output:
+# Pretrained model:
+# trainable params: 7.24B || all params: 7.24B || trainable%: 100.0000
+
+# Define the configuration for Mixtral
+config = {
+ "num_experts": 4, # Number of expert channels
+ "experts_per_token": 2, # Experts to choose per token
+}
+
+# Initialize the upscaling algorithm
+upscaling_for_causal_lm_algorithm = MixtralForCausalLMUpscalingAlgorithm(
+ DictConfig(config)
+)
+
+# Run the upscaling process to get a Mixtral model
+mixtral_for_causal_lm_model = upscaling_for_causal_lm_algorithm.run(pretrained_model)
+
+print("Mixtral model:")
+print_parameters(mixtral_for_causal_lm_model)
+# Outputs:
+# Mixtral model:
+# trainable params: 24.15B || all params: 24.15B || trainable%: 100.0000
+
+# Save the upscaled Mixtral model
+mixtral_for_causal_lm_model.save_pretrained("path_to_save_mixtral_model")
+
This is a guide on how to use the fusion_bench
command-line interface to upscale a Mistral model to a Mixtral model.
The first code block is a YAML configuration file for the upscaling method. The name field specifies the name of the upscaling method. The num_experts
field specifies the number of experts to use in the upscaling process. The experts_per_token
field specifies the number of experts to use per token. The save_checkpoint
field specifies the path where the upscaled model will be saved, if provided.
name: mixtral_for_causal_lm_moe_upscaling # or "mixtral_moe_upscaling"
+
+num_experts: 4
+experts_per_token: 2
+# path to save the upscaled model
+save_checkpoint: null
+
The second code block is another YAML configuration file, this time for the model pool. The type
field specifies the type of model pool to use. The models
field is a list of models to include in the pool. Each model should have a name
and a path
, and the model is loaded from the path
.
type: AutoModelForCausalLMPool
+# each model should have a name and a path, and the model is loaded from the path
+# this is equivalent to `AutoModelForCausalLM.from_pretrained(path)`
+models:
+ - name: _pretrained_
+ path: path_to_your_pretrained_model
+
Finally, the third code block is a bash command that runs the fusion_bench command-line interface with the specified method, model pool, and task pool. The method argument specifies the upscaling method to use. The modelpool argument specifies the model pool to use. The modelpool.models.0.path argument specifies the path to the pretrained model to use. The taskpool argument specifies the task pool to use. In this case, a dummy task pool is used that does nothing but print the parameter counts of the merged model.
+fusion_bench \
+ method=mixtral_moe_upscaling \
+ modelpool=mixtral_moe_upscaling \
+ modelpool.models.0.path=path_to_your_pretrained_model \
+ taskpool=dummy # this is a dummy taskpool that does nothing but print the parameter counts of the merged model
+
mixtral_upcycling
+
+
+¶
MixtralForCausalLMUpscalingAlgorithm
+
+
+¶
+ Bases: ModelFusionAlgorithm
This class is responsible for upscaling a model to a MixtralForCausalLM. +It inherits from the ModelFusionAlgorithm class.
+ +fusion_bench/method/mixture_of_experts/mixtral_upcycling.py
run(modelpool)
+
+¶Runs the upscaling process.
+ + +Parameters:
+modelpool
+ (ModelPool | LlamaForCausalLM | MistralForCausalLM
)
+ –
+ The model to be upscaled.
+Returns:
+MixtralForCausalLM
( MixtralForCausalLM
+) –
+ The upscaled model.
+fusion_bench/method/mixture_of_experts/mixtral_upcycling.py
MixtralUpscalingAlgorithm
+
+
+¶
+ Bases: ModelFusionAlgorithm
This class is responsible for upscaling a model to a MixtralModel. +It inherits from the ModelFusionAlgorithm class.
+ +fusion_bench/method/mixture_of_experts/mixtral_upcycling.py
run(modelpool)
+
+¶Runs the upscaling process.
+ + +Parameters:
+modelpool
+ (ModelPool | LlamaModel | MistralModel
)
+ –
+ The model to be upscaled.
+Returns:
+MixtralModel
( MixtralModel
+) –
+ The upscaled model.
+fusion_bench/method/mixture_of_experts/mixtral_upcycling.py
upscale_to_mixtral_for_causal_lm(input_model, output_model)
+
+¶A helper function.
+Upscales a LlamaForCausalLM or MistralForCausalLM to a MixtralForCausalLM.
+ + +Parameters:
+input_model
+ (LlamaForCausalLM | MistralForCausalLM
)
+ –
+ The input model to be upscaled.
+output_model
+ (MixtralForCausalLM
)
+ –
+ The output model where the upscaled weights will be loaded.
+Returns:
+None
+fusion_bench/method/mixture_of_experts/mixtral_upcycling.py
upscale_to_mixtral_model(input_model, output_model)
+
+¶A helper function.
+Upscales a LlamaModel or MistralModel to a MixtralModel.
+ + +Parameters:
+input_model
+ (LlamaModel | MistralModel
)
+ –
+ The input model to be upscaled.
+output_model
+ (MixtralModel
)
+ –
+ The output model where the upscaled weights will be loaded.
+Returns:
+None
+fusion_bench/method/mixture_of_experts/mixtral_upcycling.py
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints. http://arxiv.org/abs/2212.05055 ↩
+Simple averaging is known in the literature as ModelSoups, aims to yield a more robust and generalizable model.
+In the context of full fine-tuned models, the weights are averaged directly. Concretely, this means that if we have \(n\) models with their respective weights \(\theta_i\), the weights of the final model \(\theta\) are computed as:
+This equation simply states that each weight of the final model is the average of the corresponding weights in the individual models. For example, if we have three models and the weight of the first neuron in the first layer is 0.1, 0.2, and 0.3 in each model respectively, the weight of that neuron in the final model will be (0.1 + 0.2 + 0.3) / 3 = 0.2.
+This method assumes that all models are equally good. +If some models are significantly better than others, it might be beneficial to assign more weight to the better models when averaging. +This can be done by using weighted averaging, where each model's contribution to the final model is weighted by its performance on a validation set or some other metric. +See Weighed Averaging for more details.
+In this example, we will demonstrate how to use the SimpleAverageAlgorithm
class from the fusion_bench.method
module.
+This algorithm is used to merge multiple models by averaging their parameters.
from fusion_bench.method import SimpleAverageAlgorithm
+
+# Instantiate the SimpleAverageAlgorithm
+# This algorithm will be used to merge multiple models by averaging their parameters.
+algorithm = SimpleAverageAlgorithm()
+
+# Assume we have a list of PyTorch models (nn.Module instances) that we want to merge.
+# The models should all have the same architecture.
+models = [...]
+
+# Run the algorithm on the models.
+# This will return a new model that is the result of averaging the parameters of the input models.
+merged_model = algorithm.run(models)
+
The run
method of the SimpleAverageAlgorithm
class takes a list of models as input and returns a new model.
+The new model's parameters are the average of the parameters of the input models.
+This is useful in scenarios where you have trained multiple models and want to combine them into a single model that hopefully performs better than any individual model.
Configuration template for the Simple Averaging algorithm:
+ +use the following command to run the Simple Averaging algorithm:
+ +
SimpleAverageAlgorithm
+
+
+¶
+ Bases: ModelFusionAlgorithm
fusion_bench/method/simple_average.py
run(modelpool)
+
+¶Fuse the models in the given model pool using simple averaging.
+This method iterates over the names of the models in the model pool, loads each model, and appends it to a list. +It then returns the simple average of the models in the list.
+ + +Parameters:
+modelpool
+ (ModelPool
)
+ –
+ The pool of models to fuse.
+Returns:
+The fused model obtained by simple averaging.
+fusion_bench/method/simple_average.py
Ensemble methods are simple and effective ways to improve the performance of machine learning models. +They combine the outputs of multiple models to create a stronger model.
+from fusion_bench.method import EnsembleAlgorithm
+
+# Instantiate the EnsembleAlgorithm
+algorithm = EnsembleAlgorithm()
+
+# Assume we have a list of PyTorch models (nn.Module instances) that we want to ensemble.
+models = [...]
+
+# Run the algorithm on the models.
+merged_model = algorithm.run(models)
+
Configuration template for the ensemble algorithm:
+ +create a simple ensemble of CLIP-ViT models for image classification
+ + + + + + + + + + + + + + +In the rapidly advancing field of machine learning, multi-task learning has emerged as a powerful paradigm, allowing models to leverage information from multiple tasks to improve performance and generalization. One intriguing method in this domain is Task Arithmetic, which involves the combination of task-specific vectors derived from model parameters.
+Task Vector. A task vector is used to encapsulate the adjustments needed by a model to specialize in a specific task. +It is derived from the differences between a pre-trained model's parameters and those fine-tuned for a particular task. +Formally, if \(\theta_i\) represents the model parameters fine-tuned for the i-th task and \(\theta_0\) denotes the parameters of the pre-trained model, the task vector for the i-th task is defined as:
+This representation is crucial for methods like Task Arithmetic, where multiple task vectors are aggregated and scaled to form a comprehensive multi-task model.
+Task Arithmetic1 begins by computing a task vector \(\tau_i\) for each individual task, using the set of model parameters \(\theta_0 \cup \{\theta_i\}_i\) where \(\theta_0\) is the pre-trained model and \(\theta_i\) are the fine-tuned parameters for i-th task. +These task vectors are then aggregated to form a multi-task vector. +Subsequently, the multi-task vector is combined with the pre-trained model parameters to obtain the final multi-task model. +This process involves scaling the combined vector element-wise by a scaling coefficient (denoted as \(\lambda\)), before adding it to the initial pre-trained model parameters. +The resulting formulation for obtaining a multi-task model is expressed as
+The choice of the scaling coefficient \(\lambda\) plays a crucial role in the final model performance. Typically, \(\lambda\) is chosen based on validation set performance.
+Configuration template for the Task Arithmetic algorithm:
+name: task_arithmetic
+scaling_factor: 0.5 # Scaling factor for task vectors
+
Use the following command to run the Task Arithmetic algorithm:
+ + + +
TaskArithmeticAlgorithm
+
+
+¶
+ Bases: ModelFusionAlgorithm
fusion_bench/method/task_arithmetic.py
run(modelpool)
+
+¶fusion_bench/method/task_arithmetic.py
Ties-Merging1 represents a novel and structured approach to consolidating multiple task-specific models into a single, efficient multi-task model. This method employs a sequence of deliberate steps to systematically merge task vectors, ensuring that the final model effectively integrates the strengths of each individual task-specific model and resolves potential conflicts between them.
+The Ties-Merging algorithm operates through three primary steps:
+Given the final merged task vector \(\tau\), the ultimate model is determined similarly to the method used in task arithmetic. The formulation is expressed as:
+where \(\lambda\) is a hyperparameter chosen based on the validation set to ensure the best-performing model.
+By following these structured steps, Ties-Merging effectively integrates multiple task-specific models into a unified multi-task model, balancing the contributions of each task to enhance overall performance. The process ensures that the final model retains the benefits of the pre-trained model while optimally incorporating the diverse knowledge contained within the individual task-specific models.
+Configuration template for the Ties-Merging algorithm:
+name: ties_merging
+# Scaling factor $\lambda$
+scaling_factor: 0.5
+threshold: 0.5
+# List of keys to remove from the state dict, default is empty
+remove_keys: []
+# Function to merge the models, default is sum. Options are 'sum', 'mean', and 'max'
+merge_func: sum
+
Use the following command to run the Ties-Merging algorithm:
+ + + +
TiesMergingAlgorithm
+
+
+¶
+ Bases: ModelFusionAlgorithm
fusion_bench/method/ties_merging/ties_merging.py
run(modelpool)
+
+¶fusion_bench/method/ties_merging/ties_merging.py
(NIPS 2023) Resolving Interference When Merging Models. http://arxiv.org/abs/2306.01708 ↩
+This method is designed to handle a wide range of tasks by segregating shared information and task-specific knowledge. +It dynamically combines these elements based on the input samples.
+The Weight-Ensembling MoE module consists of three main components: the router, the pre-trained MLP weights, and a collection of task vectors. +The router, which is an MLP, processes the input data and generates routing weights. These weights determine how the knowledge from different tasks is combined. +The pre-trained MLP weights are crucial as they have been trained to recognize a wide range of data patterns. +The task vectors represent the differences between the MLPs that have been fine-tuned for specific tasks and the pre-trained ones, capturing the unique adjustments made to optimize them for specific tasks. +The routing weights are averaged across the input tokens, and these weights are used to select task vectors from a dictionary matrix. +These task vectors are then added to the pre-trained MLP weights to create input-conditioned weights.
+multi-task model fusion experiment on eight image classification tasks.
+# merge eight CLIP-ViT-B/32 models using WE MoE
+fusion_bench \
+ method=weight_ensembling_moe \
+ method.name=clip_weight_ensembling_moe \
+ method.use_grad_accumulate=false \
+ method.save_checkpoint=outputs/clip-vit-base-patch32_TA8_weight_ensembling_moe_checkpoint.ckpt \
+ modelpool=clip-vit-base-patch32_TA8 \
+ taskpool=clip-vit-classification_TA8
+
merge eight CLIP-ViT-L/14 models:
+# merge eight CLIP-ViT-L/14 models using WE MoE, fine-tune the routers
+fusion_bench print_config=false \
+ method=weight_ensembling_moe \
+ method.name=clip_weight_ensembling_moe \
+ method.use_grad_accumulate=true \
+ method.save_checkpoint=outputs/clip-vit-large-patch14_TA8_weight_ensembling_moe_checkpoint.ckpt \
+ method.batch_size=4 method.devices=4 \
+ modelpool=clip-vit-large-patch14_TA8 \
+ taskpool=dummy &&
+
+# load the checkpoint and evaluate the model
+fusion_bench \
+ method=weight_ensembling_moe \
+ method.name=clip_weight_ensembling_moe \
+ method.checkpoint=outputs/clip-vit-large-patch14_TA8_weight_ensembling_moe_checkpoint.ckpt \
+ modelpool=clip-vit-large-patch14_TA8 \
+ taskpool=clip-vit-classification_TA8 \
+ taskpool.clip_model=openai/clip-vit-large-patch14
+
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts. http://arxiv.org/abs/2402.00433 ↩
+Weighted averaging, also known as weight-ensembling. +In the context of full fine-tuned models, the weights are averaged according to their respective performance weights. Concretely, this means that if we have \(n\) models with their respective weights \(\theta_i\) and model-wise weights \(w_i\), the weights of the final model \(\theta\) are computed as:
+Configuration template for the Weighted Averaging algorithm:
+name: weighted_average
+normalize: true # if true, the weights will be normalized before merging
+weights: # List of weights for each model
+ - 0.5
+ - 0.5
+
Use the following command to run the Weighted Averaging algorithm:
+ +
WeightedAverageAlgorithm
+
+
+¶
+ Bases: ModelFusionAlgorithm
fusion_bench/method/weighted_average.py
run(modelpool)
+
+¶Fuses the models in the model pool using a weighted average approach.
+modelpool : ModelPool + The pool of models to be fused.
+ValueError + If the number of weights does not match the number of models in the model pool.
+forward_model : torch.nn.Module + The resulting model after fusion.
+ +fusion_bench/method/weighted_average.py
A weighted ensemble is a machine learning technique that combines the predictions of multiple models to produce a final prediction. The idea is to leverage the strengths of each individual model to improve overall performance and robustness.
+Formally, a weighted ensemble can be defined as follows:
+Given a set of \(n\) models, each model \(f_i\) produces a prediction \(f_i(x)\) for an input \(x\). Each model \(i\) also has an associated weight \(w_i\). The final prediction \(F(x)\) of the weighted ensemble is a weighted sum of the individual model predictions:
+The weights \(w_i\) are typically non-negative and sum to 1 (i.e., \(\sum_{i=1}^n w_i = 1\)), which ensures that the final prediction is a convex combination of the individual model predictions. +The weights can be determined in various ways. They could be set based on the performance of the models on a validation set, or they could be learned as part of the training process. In some cases, all models might be given equal weight. +The goal of a weighted ensemble is to produce a final prediction that is more accurate or robust than any individual model. This is particularly useful when the individual models have complementary strengths and weaknesses.
+The following Python code snippet demonstrates how to use the WeightedEnsembleAlgorithm
class from the fusion_bench.method
module to create a weighted ensemble of PyTorch models.
from omegaconf import DictConfig
+from fusion_bench.method import WeightedEnsembleAlgorithm
+
+#Instantiate the algorithm
+method_config = {'name': 'weighted_ensemble', 'weights': [0.3, 0.7]}
+algorithm = WeightedEnsembleAlgorithm(DictConfig(method_config))
+
+# Assume we have a list of PyTorch models (nn.Module instances) that we want to ensemble.
+models = [...]
+
+# Run the algorithm on the models.
+merged_model = algorithm.run(models)
+
Here's a step-by-step explanation:
+Instantiate the WeightedEnsembleAlgorithm
:
method_config
is created with two keys: 'name'
and 'weights'
. The 'name'
key is set to 'weighted_ensemble'
indicating the type of ensemble method to use. The 'weights'
key is set to a list of weights [0.3, 0.7]
indicating the weights assigned to each model in the ensemble.method_config
dictionary is converted to a DictConfig
object, which is a configuration object used by the omegaconf
library.WeightedEnsembleAlgorithm
is then instantiated with the DictConfig
object as an argument.Assume a list of PyTorch models that you want to ensemble. This list is assigned to the variable models
. The actual models are not shown in this code snippet.
Run the algorithm on the models: The run
method of the WeightedEnsembleAlgorithm
instance is called with the models
list as an argument. The result is a merged model that represents the weighted ensemble of the input models. This merged model is assigned to the variable merged_model
.
Here we list the options for the weighted ensemble algorithm:
+Option | +Default | +Description | +
---|---|---|
weights |
++ | A list of floats representing the weights for each model in the ensemble. | +
normalize |
+True |
+Whether to normalize the weights so that they sum to 1. Default is True . |
+
if normalize
is set to True
, the weights will be normalized so that they sum to 1. Mathematically, this means that the weights \(w_i\) will be divided by the sum of all weights, so that
Configuration template for the weighted ensemble algorithm:
+name: weighted_ensemble
+
+# this should be a list of floats, one for each model in the ensemble
+# If weights is null, the ensemble will use the default weights, which are equal weights for all models.
+weights: null
+nomalize: true
+
Construct a weighted ensemble using our CLI tool fusion_bench
: