We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
finetune_moss.py 中修改如下 accelerator = Accelerator(mixed_precision='fp8')
环境用的nvidia的容器 nvcr.io/nvidia/pytorch:23.06-py3 https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch
因计算卡显存不足,DeepSpeed offload cpu
修改 sft.yaml 如下
command_file: null commands: null compute_environment: LOCAL_MACHINE deepspeed_config: gradient_accumulation_steps: 1 gradient_clipping: 1.0 offload_optimizer_device: cpu offload_param_device: cpu zero3_init_flag: true zero3_save_16bit_model: true zero_stage: 3 distributed_type: DEEPSPEED downcast_bf16: 'no' dynamo_backend: 'NO' fsdp_config: {} gpu_ids: null machine_rank: 0 main_process_ip: null main_process_port: null main_training_function: main megatron_lm_config: {} mixed_precision: fp8 num_machines: 1 num_processes: 1 rdzv_backend: static same_network: true tpu_name: null tpu_zone: null use_cpu: false
我设置fp8格式微调后,训练速度变慢,是怎么回事呢?
DeepSpeed v0.9.5 FP8 unittest for H100 by @jomayeri in microsoft/DeepSpeed#3731
难道是DeepSpeed offload cpu 后,cpu不支持fp8导致的? 我的cpu是Intel® Xeon® w9-3495X Processor
The text was updated successfully, but these errors were encountered:
No branches or pull requests
finetune_moss.py 中修改如下
accelerator = Accelerator(mixed_precision='fp8')
环境用的nvidia的容器 nvcr.io/nvidia/pytorch:23.06-py3
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch
因计算卡显存不足,DeepSpeed offload cpu
修改 sft.yaml 如下
command_file: null
commands: null
compute_environment: LOCAL_MACHINE
deepspeed_config:
gradient_accumulation_steps: 1
gradient_clipping: 1.0
offload_optimizer_device: cpu
offload_param_device: cpu
zero3_init_flag: true
zero3_save_16bit_model: true
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: null
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp8
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_name: null
tpu_zone: null
use_cpu: false
我设置fp8格式微调后,训练速度变慢,是怎么回事呢?
DeepSpeed v0.9.5
FP8 unittest for H100 by @jomayeri in microsoft/DeepSpeed#3731
难道是DeepSpeed offload cpu 后,cpu不支持fp8导致的? 我的cpu是Intel® Xeon® w9-3495X Processor
The text was updated successfully, but these errors were encountered: