v0.6.0: Paper Release, GaLore and FSDP+QLoRA #2969
hiyouga
announced in
Announcements
Replies: 1 comment 1 reply
-
hello @hiyouga just read the paper on training details D.1 what is the input length on pre traininng step ? could find any |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We released our paper on arXiv! Thanks to all co-authors and AK's recommendation
New features
--infer_backend vllm
apply_chat_template
by adding a chat template to the tokenizer after fine-tuningNew models
New datasets
Bug fix
offload_dir
to dispatch this model according to thisdevice_map
#2802 addpip install fire
to requirements.txt #2803 使用webui,执行eval&pre的预览命令或开始后报错 KeyError: dropdown #2817 LoRA+ with DeepSpeed #2895 使用lora微调时,同时训练了一些层的参数,合并验证报错 #2928 AttributeError: 'torch.dtype' object has no attribute 'itemsize' #2936 LongLoRA Issue: sft加入--shift_attn 报错 (sft with--shift_attn giving errors) #2941This discussion was created from the release v0.6.0: Paper Release, GaLore and FSDP+QLoRA.
Beta Was this translation helpful? Give feedback.
All reactions