You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[INFO|modeling_utils.py:3553] 2024-11-26 11:15:33,752 >> loading weights file /data/models/Qwen2.5-14B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:3698] 2024-11-26 11:15:33,753 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[2024-11-26 11:15:33,753] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7
[WARNING|logging.py:328] 2024-11-26 11:15:33,755 >> You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
[WARNING|logging.py:328] 2024-11-26 11:15:33,755 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
[WARNING|logging.py:328] 2024-11-26 11:15:33,762 >> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
[INFO|configuration_utils.py:1000] 2024-11-26 11:15:33,762 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"eos_token_id": 151645,
"use_cache": false
}
[WARNING|logging.py:328] 2024-11-26 11:15:33,763 >> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
[2024-11-26 11:15:33,906] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7
[2024-11-26 11:15:33,906] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
worker log:
[INFO|modeling_utils.py:3553] 2024-11-26 11:15:33,573 >> loading weights file /data/models/Qwen2.5-14B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:3698] 2024-11-26 11:15:33,573 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[2024-11-26 11:15:33,574] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 7
[WARNING|logging.py:328] 2024-11-26 11:15:33,576 >> You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
[WARNING|logging.py:328] 2024-11-26 11:15:33,576 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
[WARNING|logging.py:328] 2024-11-26 11:15:33,582 >> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
[INFO|configuration_utils.py:1000] 2024-11-26 11:15:33,582 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"eos_token_id": 151645,
"use_cache": false
}
[WARNING|logging.py:328] 2024-11-26 11:15:33,583 >> Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
The text was updated successfully, but these errors were encountered:
Reminder
System Info
llamafactory
version: 0.9.1.dev0Reproduction
#manager
CUDA_VISIBLE_DEVICES=0,1,2 FORCE_TORCHRUN=1 NNODES=2 RANK=0 MASTER_ADDR=192.168.12.2 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen2_lora_dpo.yaml
#worker
FORCE_TORCHRUN=1 NNODES=2 RANK=1 MASTER_ADDR=192.168.12.2 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/qwen2_lora_dpo.yaml
Expected behavior
No response
Others
qwen2_lora_dpo.yaml
manager log:
worker log:
The text was updated successfully, but these errors were encountered: