Inquiry on Support for Qwen2.5 Models and Large Model Training Capabilities #43

ArcherShirou · 2024-10-21T08:26:11Z

I would like to inquire if there are plans to support the Qwen2.5 && Qwen2 series or other popular models from the open-source community, such as Yi. Will the framework support the merging of large models, like the 72B version, similar to MergeKit? Given that running a 72B model requires a significant amount of memory, will the training phase accommodate quantization and LoRA to enable it to run on a single machine with 8 A800 GPUs? Additionally, will DeepSpeed be supported for distributed training? If it could support the merging and training of common model sizes such as 72B, 70B, 34B, 14B, and 7B, it would greatly enhance the applicability of the methods.

SolshineCode · 2024-10-21T09:24:44Z

I've added a PR regarding adding Qwen 2.5 and Qwen 2 series, part 1 towards integrating them:
#45

I was thinking similarly with quantization and LORA. I don't think LORA would work here though because the DAM method uses the Logits directly.

shamanez · 2024-10-21T10:08:15Z

Thanks, @ArcherShirou, for exploring our codebase.

would like to inquire if there are plans to support the Qwen2.5 && Qwen2 series or other popular models from the open-source community, such as Yi. Will the framework support the merging of large models, like the 72B version, similar to MergeKit?

Definalty we could do this, as I mentioned in the #45

Given that running a 72B model requires a significant amount of memory, will the training phase accommodate quantization and LoRA to enable it to run on a single machine with 8 A800 GPUs?

Actually, quantized models can only be trained with adapter methods like LORA. But in our method, we only train merging coefficients, which is a pretty small number of parameters. So, the problem will only come with the VRAM consumption during the model loading.

We already tried Deepseed, and you can check it out in the "legacy" folder. But for our experiments, deepspeed gave us an OOM issue since we tried to merge three different 7B models, where the merged model had around 22B parameters. As far as I remember, the number of training parameters was around 3 million.

ArcherShirou · 2024-10-22T07:09:52Z

Thank you for your response. I‘m looking forward to the updates to the framework; It is really a fantastic work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry on Support for Qwen2.5 Models and Large Model Training Capabilities #43

Inquiry on Support for Qwen2.5 Models and Large Model Training Capabilities #43

ArcherShirou commented Oct 21, 2024 •

edited

Loading

SolshineCode commented Oct 21, 2024

shamanez commented Oct 21, 2024

ArcherShirou commented Oct 22, 2024

Inquiry on Support for Qwen2.5 Models and Large Model Training Capabilities #43

Inquiry on Support for Qwen2.5 Models and Large Model Training Capabilities #43

Comments

ArcherShirou commented Oct 21, 2024 • edited Loading

SolshineCode commented Oct 21, 2024

shamanez commented Oct 21, 2024

ArcherShirou commented Oct 22, 2024

ArcherShirou commented Oct 21, 2024 •

edited

Loading