Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impossible to train a model using both bf16 mixed precision training and torch compile, RuntimeError: expected mat1 and mat2 to have the same dtype #34470

Closed
2 of 4 tasks
RonanFR opened this issue Oct 28, 2024 · 7 comments
Labels

Comments

@RonanFR
Copy link

RonanFR commented Oct 28, 2024

System Info

  • transformers version: 4.45.2
  • datasets version: 3.0.1
  • Platform: Linux-5.15.0-1070-aws-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.26.1
  • Safetensors version: 0.4.5
  • Accelerate version: 1.0.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.5.0+cu118 (True)
  • Tensorflow version (GPU?): 2.14.1 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: no
  • Using GPU in script?: yes
  • GPU type: NVIDIA A10G

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import torch
from transformers import pipeline
from transformers import TrainingArguments, Trainer
from datasets import load_dataset

# Load classification pipeline from pretrained model
pipe = pipeline(
    "text-classification",
    model="Qwen/Qwen2.5-0.5B" ,
    model_kwargs={
        "num_labels": 5,
    },
    device_map="cuda"
)
print({p.data.dtype for p in pipe.model.parameters()})

# Load + format dataset
dataset = load_dataset("yelp_review_full")["train"].select(range(100))
def tokenize_function(examples):
    return pipe.tokenizer(
        examples["text"], 
        max_length=124, 
        padding="max_length", 
        truncation=True
    )
tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Train 
training_args = TrainingArguments(
    per_device_train_batch_size=8,
    num_train_epochs=1,
    torch_compile=True, 
    bf16=True,  # use bfloat16 mixed precision training
    output_dir="/tmp/tests/test_1",
)

Expected behavior

  • The code attached raises RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16. When disabling torch compilation or using float32 (or doing both), everything works fine.

  • The problem does not seem to occur when pytorch is downgraded to version 2.4.1. I am not fully sure though because in this case another error occur: RuntimeError: invalid dtype for bias when use compile + autocast · Issue #124901 · pytorch/pytorch · GitHub 1 (at the end of the issue they mention that the problem is fixed with pytorch 2.5.0, but then the issue above occurs, I am stuck in a circular loop 😅 )

  • The same problem seems to occur with float16 instead of bfloat16 (but not for tensorfloat32 apparently).

  • The same code works perfectly well with "facebook/bart-large" instead of "Qwen/Qwen2.5-0.5B". But other models like "TinyLlama/TinyLlama_v1.1" suffer from the same issue as "Qwen/Qwen2.5-0.5B".

@RonanFR RonanFR added the bug label Oct 28, 2024
@Rocketknight1
Copy link
Member

Rocketknight1 commented Oct 29, 2024

Hi @RonanFR, in general pipelines are inference-only, so loading the model with a pipeline and then training it is a bit odd! Can you see if you still get the issue with you initialize the model with AutoModelForSequenceClassification and AutoTokenizer instead? If you can give us some clean code without pipeline that reproduces the issue, we can investigate further.

@RonanFR
Copy link
Author

RonanFR commented Oct 29, 2024

Thanks for your reply @Rocketknight1 !
Indeed, I tested your suggestion and it works perfectly fine when training the last layer only (see message below).

@RonanFR RonanFR closed this as completed Oct 29, 2024
@RonanFR RonanFR reopened this Oct 29, 2024
@RonanFR
Copy link
Author

RonanFR commented Oct 29, 2024

@Rocketknight1 actually I went a bit fast before writing the last message. There is no issue when only the last layer score.weight is set to trainable (i.e., for which requires_grad is set to True). But if we train other layers then the same RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16 seems to occur.

Minimum reproducible example:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers import TrainingArguments
from transformers import Trainer
from datasets import load_dataset

# Load classification pipeline from pretrained model
model = AutoModelForSequenceClassification.from_pretrained(
    "TinyLlama/TinyLlama_v1.1",
    num_labels=5,
    device_map="cuda"
)
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama_v1.1")
tokenizer.add_special_tokens({"pad_token": tokenizer.eos_token})
model.resize_token_embeddings(len(tokenizer))
model.config.pad_token_id = tokenizer.pad_token_id

for n, p in model.named_parameters():
    if ("score" not in n) and ("q_proj" not in n):
        p.requires_grad = False

# Load + format dataset
dataset = load_dataset("yelp_review_full")["train"].select(range(100))
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        max_length=20,
        padding="max_length",
        truncation=True
    )
tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Train 
training_args = TrainingArguments(
    per_device_train_batch_size=2**4,
    num_train_epochs=1,
    torch_compile=True, 
    bf16=True,
    logging_strategy="steps",
    logging_steps=1,
    output_dir="/tmp/test1",
    use_cpu=False
)
trainer = Trainer(
    model=model,
    train_dataset=tokenized_datasets,
    eval_dataset=tokenized_datasets,
    args=training_args,
    tokenizer=tokenizer,
)
trainer.train()

I am selecting only the last score layer and q_proj layers in the above code, but the same problem occurs if selecting v_proj layers for instance. Only when just the score layer is trainable is the code working without errors.

I also tried with PEFT (instead of manually setting requires_grad to True on entire layers and False on others), and the same problem occurs.

@Rocketknight1
Copy link
Member

Yes, I can reproduce the issue, but only by going back to 4.45. It's unfortunately a little awkward on the latest version - there's another issue affecting Llama model training, so I can't fully reproduce the problem on main: #34442

@rfruit17
Copy link

Any news on this bug ?
I have just tried to run the code with the latest transformers version (4.46.3) and it seems the problem is still there.

@RonanFR
Copy link
Author

RonanFR commented Nov 22, 2024

Actually the problem is solved when also upgrading pytorch to version 2.5.1

@RonanFR RonanFR closed this as completed Nov 22, 2024
@mobilejammer
Copy link

Actually the problem is solved when also upgrading pytorch to version 2.5.1
seem deepspeed have the problem also. I run sd-scipt project in github.com. datatype mismatch is exits also.

python
Python 3.10.12 (main, Nov 6 2024, 20:22:13) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import torch
torch.version
'2.5.1+cu124'

my run script is: accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 --main_process_port 8080 flux_train.py --deepspeed

error:
[rank7]: File "/home/ubuntu/SimpleTuner/.venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 125, in forward
[rank7]: return F.linear(input, self.weight, self.bias)
[rank7]: RuntimeError: mat1 and mat2 must have the same dtype, but got Float and BFloat16
when deepspeed not enable, run ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants