[Urgent] After reinstalling unsloth, Llama 3.2/3.1 fine tuning gets error with customized compute_metrics function #1327

yuan-xia · 2024-11-22T19:56:05Z

Hi, there might be a bug in unsloth I found. For better clarification, I shared the code of the unsloth's llama 3.1 training notebook just with a small change . anyone can help me check why the trainer is not working? I just add a compute metrics to test. The "pred" in compute metrics surprisingly gets nothing?! (it worked before.)

https://drive.google.com/file/d/1UPMxPUifjLKgYOpIfLDvER1LHC4hop63/view?usp=sharing

`def compute_metrics(pred):
predictions, labels = pred
print(predictions)
print(labels)
labels = pred.label_ids
preds = pred.predictions#.argmax(-1)
print("predictions: ", str(preds))

trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
eval_dataset= dataset.take(100),
compute_metrics=compute_metrics,
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
# num_train_epochs = 1, # Set this for 1 full training run.
max_steps = 60,
per_device_eval_batch_size=2,
eval_accumulation_steps = 1,
eval_steps = 1,
eval_strategy="steps",
save_strategy = "steps",
learning_rate = 2e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
report_to = "none", # Use this for WandB etc
),
)

trainer_stats = trainer.train()`

error:
()
[[128000 39314 374 ... -100 -100 -100]
[128000 39314 374 ... -100 -100 -100]
[128000 39314 374 ... -100 -100 -100]
...
[128000 39314 374 ... -100 -100 -100]
[128000 39314 374 ... -100 -100 -100]
[128000 39314 374 ... -100 -100 -100]]
predictions: ()

TypeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 trainer_stats = trainer.train()

5 frames
/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
4274 metrics[f"{metric_key_prefix}_loss"] = np.concatenate(all_losses).mean().item()
4275 elif isinstance(all_losses, np.ndarray):
-> 4276 metrics[f"{metric_key_prefix}_loss"] = all_losses.mean().item()
4277 if hasattr(self, "jit_compilation_time"):
4278 metrics[f"{metric_key_prefix}_jit_compilation_time"] = self.jit_compilation_time

TypeError: 'NoneType' object does not support item assignment

ineffablekenobi · 2024-11-24T16:00:15Z

Hey, I'm facing the same error. I've defined compute_metrices like

def compute_metrics(pred):
    labels_ids = pred.label_ids
    pred_ids = pred.predictions[0]

preds are empty.
I've observed similar behavior for preprocess_logits_for_metrics

def preprocess_logits_for_metrics(logits, labels):
    print(logits)
    pred_ids = np.argmax(logits, axis=-1)
    return pred_ids, labels

logits is passed as an empty tuple.

yuan-xia · 2024-11-24T22:17:59Z

Hi there! I've analyzed the issue with your compute_metrics function, and I can help you resolve the error you're encountering.

The main problem is in how the compute_metrics function is accessing the predictions. Let me show you the correct way to implement this:

Hi, thanks for your reply, but your suggestions are not working in the training. I have changed pred to eval_pred. Besides, the two arguments are not accepted in the current version of Trainer. FYI, I'm training Llama 3.1 8B in SFTrainer. You could refer to the Colab link I shared for more details, which is just an unsloth public notebook.

I have defined it as follows:
def compute_metrics(eval_pred): predictions = eval_pred.predictions labels = eval_pred.label_ids print("predictions: ", str(predictions))

predictions are an empty tuple

yuan-xia · 2024-11-24T22:20:21Z

Hey, I'm facing the same error. I've defined compute_metrices like
def compute_metrics(pred):
    labels_ids = pred.label_ids
    pred_ids = pred.predictions[0]
preds are empty. I've observed similar behavior for preprocess_logits_for_metrics
def preprocess_logits_for_metrics(logits, labels):
    print(logits)
    pred_ids = np.argmax(logits, axis=-1)
    return pred_ids, labels
logits is passed as an empty tuple.

Hi there, I have the same issue as yours! I also tested preprocess_logits_for_metrics and it also gets an empty tuple. And the above change suggestions still do not work in my case. This never happens before! Have you solved this issue?

danielhanchen · 2024-11-25T11:06:35Z

Apologies on the horrid delay - I recently added Apple's reduced memory cross entropy, so it stopped calculating the logits (hence the issue) - I'm planning to move this into a flag, so hopefully it'll return the behavior.

yuan-xia · 2024-11-25T22:26:21Z

Apologies on the horrid delay - I recently added Apple's reduced memory cross entropy, so it stopped calculating the logits (hence the issue) - I'm planning to move this into a flag, so hopefully it'll return the behavior.

Thanks Daniel for sharing this detail. Yes, our fine-tuning is kind of stuck due to this bug in the evaluation step. I'm happy to know it will be fixed soon. Much appreciated!

danielhanchen · 2024-11-26T11:39:37Z

@yuan-xia @ineffablekenobi Ok added an optional flag now! During FastLanguageModel.from_pretrained(...) add an extra flag called return_logits = True ie

model, tokenizer = FastVisionModel.from_pretrained(
    "unsloth/Llama-3.2-11B-Vision-Instruct",
    load_in_4bit = True, # Use 4bit to reduce memory use. False for 16bit LoRA.
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context
    return_logits = True, # <<<<
)

Another option is to add an environment variable before invoking trainer.train() ie

import os
os.environ['UNSLOTH_RETURN_LOGITS'] = '1'
... trainer.train() ...

Also please update Unsloth via: (or rerun Colab / Kaggle)

pip uninstall unsloth unsloth-zoo -y
pip install --upgrade --no-cache-dir --no-deps unsloth unsloth-zoo

yuan-xia changed the title ~~! Llama 3.2/3.1 fine tuning error with customized compute_metrics function~~ ! After reinstalling unsloth, Llama 3.2/3.1 fine tuning gets error with customized compute_metrics function Nov 22, 2024

yuan-xia changed the title ~~! After reinstalling unsloth, Llama 3.2/3.1 fine tuning gets error with customized compute_metrics function~~ [Urgent] After reinstalling unsloth, Llama 3.2/3.1 fine tuning gets error with customized compute_metrics function Nov 23, 2024

This comment was marked as spam.

Sign in to view

danielhanchen added currently fixing Am fixing now! URGENT BUG Urgent bug labels Nov 25, 2024

danielhanchen added fixed - pending confirmation Fixed, waiting for confirmation from poster and removed currently fixing Am fixing now! labels Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Urgent] After reinstalling unsloth, Llama 3.2/3.1 fine tuning gets error with customized compute_metrics function #1327

[Urgent] After reinstalling unsloth, Llama 3.2/3.1 fine tuning gets error with customized compute_metrics function #1327

yuan-xia commented Nov 22, 2024 •

edited

Loading

This comment was marked as spam.

ineffablekenobi commented Nov 24, 2024

yuan-xia commented Nov 24, 2024 •

edited

Loading

yuan-xia commented Nov 24, 2024

danielhanchen commented Nov 25, 2024

yuan-xia commented Nov 25, 2024

danielhanchen commented Nov 26, 2024 •

edited

Loading

[Urgent] After reinstalling unsloth, Llama 3.2/3.1 fine tuning gets error with customized compute_metrics function #1327

[Urgent] After reinstalling unsloth, Llama 3.2/3.1 fine tuning gets error with customized compute_metrics function #1327

Comments

yuan-xia commented Nov 22, 2024 • edited Loading

error: () [[128000 39314 374 ... -100 -100 -100] [128000 39314 374 ... -100 -100 -100] [128000 39314 374 ... -100 -100 -100] ... [128000 39314 374 ... -100 -100 -100] [128000 39314 374 ... -100 -100 -100] [128000 39314 374 ... -100 -100 -100]] predictions: ()

This comment was marked as spam.

ineffablekenobi commented Nov 24, 2024

yuan-xia commented Nov 24, 2024 • edited Loading

yuan-xia commented Nov 24, 2024

danielhanchen commented Nov 25, 2024

yuan-xia commented Nov 25, 2024

danielhanchen commented Nov 26, 2024 • edited Loading

yuan-xia commented Nov 22, 2024 •

edited

Loading

error:
()
[[128000 39314 374 ... -100 -100 -100]
[128000 39314 374 ... -100 -100 -100]
[128000 39314 374 ... -100 -100 -100]
...
[128000 39314 374 ... -100 -100 -100]
[128000 39314 374 ... -100 -100 -100]
[128000 39314 374 ... -100 -100 -100]]
predictions: ()

yuan-xia commented Nov 24, 2024 •

edited

Loading

danielhanchen commented Nov 26, 2024 •

edited

Loading