You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to inquire if there are plans to support the Qwen2.5 && Qwen2 series or other popular models from the open-source community, such as Yi. Will the framework support the merging of large models, like the 72B version, similar to MergeKit? Given that running a 72B model requires a significant amount of memory, will the training phase accommodate quantization and LoRA to enable it to run on a single machine with 8 A800 GPUs? Additionally, will DeepSpeed be supported for distributed training? If it could support the merging and training of common model sizes such as 72B, 70B, 34B, 14B, and 7B, it would greatly enhance the applicability of the methods.
The text was updated successfully, but these errors were encountered:
would like to inquire if there are plans to support the Qwen2.5 && Qwen2 series or other popular models from the open-source community, such as Yi. Will the framework support the merging of large models, like the 72B version, similar to MergeKit?
Definalty we could do this, as I mentioned in the #45
Given that running a 72B model requires a significant amount of memory, will the training phase accommodate quantization and LoRA to enable it to run on a single machine with 8 A800 GPUs?
Actually, quantized models can only be trained with adapter methods like LORA. But in our method, we only train merging coefficients, which is a pretty small number of parameters. So, the problem will only come with the VRAM consumption during the model loading.
We already tried Deepseed, and you can check it out in the "legacy" folder. But for our experiments, deepspeed gave us an OOM issue since we tried to merge three different 7B models, where the merged model had around 22B parameters. As far as I remember, the number of training parameters was around 3 million.
I would like to inquire if there are plans to support the Qwen2.5 && Qwen2 series or other popular models from the open-source community, such as Yi. Will the framework support the merging of large models, like the 72B version, similar to MergeKit? Given that running a 72B model requires a significant amount of memory, will the training phase accommodate quantization and LoRA to enable it to run on a single machine with 8 A800 GPUs? Additionally, will DeepSpeed be supported for distributed training? If it could support the merging and training of common model sizes such as 72B, 70B, 34B, 14B, and 7B, it would greatly enhance the applicability of the methods.
The text was updated successfully, but these errors were encountered: