Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

两台机器全参数微调Qwen2.5-14B-Instruct挂起不动 #6143

Open
1 task done
zhaoxjmail opened this issue Nov 26, 2024 · 1 comment
Open
1 task done

两台机器全参数微调Qwen2.5-14B-Instruct挂起不动 #6143

zhaoxjmail opened this issue Nov 26, 2024 · 1 comment
Labels
pending This problem is yet to be addressed

Comments

@zhaoxjmail
Copy link

zhaoxjmail commented Nov 26, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.1.dev0
  • Platform: Linux-5.15.0-125-generic-x86_64-with-glibc2.31
  • Python version: 3.11.9
  • PyTorch version: 2.3.0+cu121 (GPU)
  • Transformers version: 4.42.3
  • Datasets version: 2.20.0
  • Accelerate version: 0.31.0
  • PEFT version: 0.11.1
  • TRL version: 0.9.4
  • GPU type: NVIDIA A800 80GB PCIe
  • DeepSpeed version: 0.15.1
  • Bitsandbytes version: 0.43.1
  • vLLM version: 0.5.0

Reproduction

#manager
CUDA_VISIBLE_DEVICES=0,1,2 FORCE_TORCHRUN=1 NNODES=2 RANK=0 MASTER_ADDR=192.168.12.2 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen2_lora_dpo.yaml

#worker
FORCE_TORCHRUN=1 NNODES=2 RANK=1 MASTER_ADDR=192.168.12.2 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen2_lora_dpo.yaml

Expected behavior

No response

Others

qwen2_lora_dpo.yaml

### model
model_name_or_path: /data/models/Qwen2.5-14B-Instruct
#quantization_bit: 8

### method
stage: dpo
do_train: true
finetuning_type: full
lora_target: all
pref_beta: 0.1
pref_loss: orpo  # choices: [sigmoid (dpo), orpo, simpo]
deepspeed: examples/deepspeed/ds_z3_config.json
lora_rank: 256 
lora_dropout: 0.1


### dataset
dataset: qwen_dpo_augmentation
template: qwen
cutoff_len: 2048
max_samples: 1000000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: /data/models/qwen2.5/dpo-14b_augmentation_fsdp
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 5.0e-6
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
flash_attn: fa2
#enable_liger_kernel: True

### ddp_backend
ddp_backend: nccl
ddp_find_unused_parameters: false  
### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

manager log:

[INFO|modeling_utils.py:3553] 2024-11-26 11:15:33,752 >> loading weights file /data/models/Qwen2.5-14B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:3698] 2024-11-26 11:15:33,753 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[2024-11-26 11:15:33,753] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7
[WARNING|logging.py:328] 2024-11-26 11:15:33,755 >> You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
[WARNING|logging.py:328] 2024-11-26 11:15:33,755 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
[WARNING|logging.py:328] 2024-11-26 11:15:33,762 >> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
[INFO|configuration_utils.py:1000] 2024-11-26 11:15:33,762 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "use_cache": false
}

[WARNING|logging.py:328] 2024-11-26 11:15:33,763 >> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
[2024-11-26 11:15:33,906] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7
[2024-11-26 11:15:33,906] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`

worker log:

[INFO|modeling_utils.py:3553] 2024-11-26 11:15:33,573 >> loading weights file /data/models/Qwen2.5-14B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:3698] 2024-11-26 11:15:33,573 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[2024-11-26 11:15:33,574] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7
[WARNING|logging.py:328] 2024-11-26 11:15:33,576 >> You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
[WARNING|logging.py:328] 2024-11-26 11:15:33,576 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
[WARNING|logging.py:328] 2024-11-26 11:15:33,582 >> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
[INFO|configuration_utils.py:1000] 2024-11-26 11:15:33,582 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "use_cache": false
}

[WARNING|logging.py:328] 2024-11-26 11:15:33,583 >> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
@github-actions github-actions bot added the pending This problem is yet to be addressed label Nov 26, 2024
@hiyouga
Copy link
Owner

hiyouga commented Nov 26, 2024

看一下最新的说明,可能有些环境变量要换 https://github.com/hiyouga/LLaMA-Factory/tree/main/examples#supervised-fine-tuning-on-multiple-nodes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

2 participants