v0.6.0: Paper Release, GaLore and FSDP+QLoRA #2969

hiyouga · 2024-03-25T15:50:44Z

hiyouga
Mar 25, 2024
Maintainer

We released our paper on arXiv! Thanks to all co-authors and AK's recommendation

New features

Support GaLore algorithm, allowing full-parameter learning of a 7B model using less than 24GB VRAM
Support FSDP+QLoRA that allows QLoRA fine-tuning of a 70B model on 2x24GB GPUs
Support LoRA+ algorithm for better LoRA fine-tuning by @qibaoyuan in [FEATURE]: ADD LORA+ ALGORITHM #2830
LLaMA Factory 🤝 vLLM, enjoy 270% inference speed with --infer_backend vllm
Add Colab notebook for easily getting started
Support pushing fine-tuned models to Hugging Face Hub in web UI
Support apply_chat_template by adding a chat template to the tokenizer after fine-tuning
Add dockerize support by @S3Studio in Add dockerize support #2743 Improve Dockerize support #2849

New models

Base models
- OLMo (1B/7B)
- StarCoder2 (3B/7B/15B)
- Yi-9B
Instruct/Chat models
- OLMo-7B-Instruct

New datasets

Supervised fine-tuning datasets
- Cosmopedia (en)
Preference datasets
- Orca DPO (en)

Bug fix

Fix flash_attn in web UI by @cx2333-gt in fix flash_attn in train_web #2730
Fix deepspeed runtime error in PPO by @stephen-nju in fix deepspeed ppo RuntimeError #2746
Fix readme ddp instruction by @khazic in Updated README with new information #2903
Fix environment variable in datasets by @SirlyDreamer in Follow HF_ENDPOINT environment variable #2905
Fix readme information by @0xez in Update README.md, fix the release date of the paper #2919
Fix generation config validation by @marko1616 in 修复了在 transformers > 4.36.2 版本中部分模型合并 Lora 模型时因生成配置校验而导致的崩溃问题 #2945
Fix requirements by @rkinas in Update requirements.txt #2963
Fix bitsandbytes windows version by @Tsumugii24 in Update README_zh.md #2967
Fix 求助：InternLM2ForCausalLM 不支持flash attention #2346 使用dolora 报错RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' is:closed #2642 建议 run_dpo 时，先加载数据，再加载模型 #2649 mac m1上训练的模型，部署时报错RuntimeError: You can't move a model that has some modules offloaded to cpu or disk. #2732 accelerate多卡训练报错 #2735 web訓練PPO出現error #2756 修复ChatModel无法加载模型 #2766 Galore tuning error for IndexError: tuple index out of range #2775 【deepspeed+galore】error in deepspeed with galore #2777 啥咱们这里使用vllm输入长度限制在2048token(千问原始支持32k的token)，而且显存也没有提供限制的参数 #2782 使用vllm推理之后显存变为之前2倍，之前在150G左右，使用vllm之后需要300G #2798 ValueError: We need an offload_dir to dispatch this model according to this device_map #2802 add pip install fireto requirements.txt #2803 使用webui，执行eval&pre的预览命令或开始后报错 KeyError: dropdown #2817 LoRA+ with DeepSpeed #2895 使用lora微调时，同时训练了一些层的参数，合并验证报错 #2928 AttributeError: 'torch.dtype' object has no attribute 'itemsize' #2936 LongLoRA Issue: sft加入--shift_attn 报错 (sft with--shift_attn giving errors) #2941

This discussion was created from the release v0.6.0: Paper Release, GaLore and FSDP+QLoRA.

acul3 · 2024-03-28T10:08:20Z

acul3
Mar 28, 2024

hello @hiyouga just read the paper

on training details D.1

what is the input length on pre traininng step ?

could find any

1 reply

hiyouga Mar 28, 2024
Maintainer Author

we use token batch size=512, that is cutoff_len=512 and per_device_batch_size=1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.0: Paper Release, GaLore and FSDP+QLoRA #2969

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

v0.6.0: Paper Release, GaLore and FSDP+QLoRA #2969

hiyouga Mar 25, 2024 Maintainer

We released our paper on arXiv! Thanks to all co-authors and AK's recommendation

New features

New models

New datasets

Bug fix

Replies: 1 comment · 1 reply

acul3 Mar 28, 2024

hiyouga Mar 28, 2024 Maintainer Author

hiyouga
Mar 25, 2024
Maintainer

Replies: 1 comment 1 reply

acul3
Mar 28, 2024

hiyouga Mar 28, 2024
Maintainer Author