Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

问题已解决:cpu+fp32运行chat.py时报错RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' #16

Open
ChinesePainting opened this issue May 16, 2023 · 1 comment

Comments

@ChinesePainting
Copy link

在cpu+fp32推理时遇到下面的报错:

File "D:\gitpro\RWKV-LM-LoRA\RWKV-v4neo\src\model_run.py", line 67, in init
w[k] += w[lora_B] @ w[lora_A] * (args.lora_alpha / args.lora_r)
RuntimeError: "addmm_impl_cpu
" not implemented for 'Half'

看chatgpt说是因为:PyTorch中的一些CPU实现并没有对半精度浮点数进行支持。

解决方案:

在/src/model_run.py增加下面几行代码:

在第50行增加:

                        if args.RUN_DEVICE == 'cpu':
                            w[k] = w_lora[k].float()

在69行增加:

                        if args.RUN_DEVICE == 'cpu':
                            w[k] = w[k].float()
                            w[lora_A] = w[lora_A].float()
                            w[lora_B] = w[lora_B].float()

还是深度学习方想的小白,如果有错误还请指正。

@ChinesePainting
Copy link
Author

最奇怪的是用CUDA_LAUNCH_BLOCKING=1 python3 train.py --load_model RWKV-4-Novel-3B-v1-Chn-20230412-ctx4096.pth --lora_load rwkv-0 --proj_dir lora_checkpoints --data_file ssg_text_document --data_type binidx --vocab_size 50277 --ctx_len 2048 --accumulate_grad_batches 2 --epoch_steps 600 --epoch_count 12 --epoch_begin 0 --epoch_save 3 --micro_bsz 1 --n_layer 32 --n_embd 2560 --pre_ffn 0 --head_qk 0 --lr_init 1e-5 --lr_final 1e-5 --warmup_steps 0 --beta1 0.9 --beta2 0.999 --adam_eps 1e-8 --accelerator gpu --devices 1 --precision fp16 --strategy ddp_find_unused_parameters_false --grad_cp 1 --lora --lora_r 8 --lora_alpha 32 --lora_dropout 0.01 --lora_parts=att,ffn,time,ln 不改代码,也可以用cpu+fp32.
CUDA_LAUNCH_BLOCKING=1 python3 train.py --load_model RWKV-4-Novel-3B-v1-Chn-20230412-ctx4096.pth --lora_load rwkv-0 --proj_dir lora_checkpoints --data_file ssg_text_document --data_type binidx --vocab_size 50277 --ctx_len 4096 --accumulate_grad_batches 2 --epoch_steps 600 --epoch_count 12 --epoch_begin 0 --epoch_save 2 --micro_bsz 1 --n_layer 32 --n_embd 2560 --pre_ffn 0 --head_qk 0 --lr_init 1e-5 --lr_final 1e-5 --warmup_steps 0 --beta1 0.9 --beta2 0.999 --adam_eps 1e-8 --accelerator gpu --devices 1 --precision fp16 --strategy deepspeed_stage_2 --grad_cp 1 --lora --lora_r 8 --lora_alpha 32 --lora_dropout 0.01 --lora_parts=att,ffn,time,ln 就得改代码了。不知道是ctx的问题还是deepspeed_stage_2的问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant