-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Urgent] After reinstalling unsloth, Llama 3.2/3.1 fine tuning gets error with customized compute_metrics function #1327
Comments
This comment was marked as spam.
This comment was marked as spam.
Hey, I'm facing the same error. I've defined compute_metrices like
preds are empty.
logits is passed as an empty tuple. |
Hi, thanks for your reply, but your suggestions are not working in the training. I have changed pred to eval_pred. Besides, the two arguments are not accepted in the current version of Trainer. FYI, I'm training Llama 3.1 8B in SFTrainer. You could refer to the Colab link I shared for more details, which is just an unsloth public notebook. I have defined it as follows: predictions are an empty tuple |
Hi there, I have the same issue as yours! I also tested preprocess_logits_for_metrics and it also gets an empty tuple. And the above change suggestions still do not work in my case. This never happens before! Have you solved this issue? |
Apologies on the horrid delay - I recently added Apple's reduced memory cross entropy, so it stopped calculating the logits (hence the issue) - I'm planning to move this into a flag, so hopefully it'll return the behavior. |
Thanks Daniel for sharing this detail. Yes, our fine-tuning is kind of stuck due to this bug in the evaluation step. I'm happy to know it will be fixed soon. Much appreciated! |
@yuan-xia @ineffablekenobi Ok added an optional flag now! During model, tokenizer = FastVisionModel.from_pretrained(
"unsloth/Llama-3.2-11B-Vision-Instruct",
load_in_4bit = True, # Use 4bit to reduce memory use. False for 16bit LoRA.
use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context
return_logits = True, # <<<<
) Another option is to add an environment variable before invoking import os
os.environ['UNSLOTH_RETURN_LOGITS'] = '1'
... trainer.train() ... Also please update Unsloth via: (or rerun Colab / Kaggle)
|
Hi, there might be a bug in unsloth I found. For better clarification, I shared the code of the unsloth's llama 3.1 training notebook just with a small change . anyone can help me check why the trainer is not working? I just add a compute metrics to test. The "pred" in compute metrics surprisingly gets nothing?! (it worked before.)
https://drive.google.com/file/d/1UPMxPUifjLKgYOpIfLDvER1LHC4hop63/view?usp=sharing
`def compute_metrics(pred):
predictions, labels = pred
print(predictions)
print(labels)
labels = pred.label_ids
preds = pred.predictions#.argmax(-1)
print("predictions: ", str(preds))
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
eval_dataset= dataset.take(100),
compute_metrics=compute_metrics,
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
# num_train_epochs = 1, # Set this for 1 full training run.
max_steps = 60,
per_device_eval_batch_size=2,
eval_accumulation_steps = 1,
eval_steps = 1,
eval_strategy="steps",
save_strategy = "steps",
learning_rate = 2e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
report_to = "none", # Use this for WandB etc
),
)
trainer_stats = trainer.train()`
error:
()
[[128000 39314 374 ... -100 -100 -100]
[128000 39314 374 ... -100 -100 -100]
[128000 39314 374 ... -100 -100 -100]
...
[128000 39314 374 ... -100 -100 -100]
[128000 39314 374 ... -100 -100 -100]
[128000 39314 374 ... -100 -100 -100]]
predictions: ()
TypeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 trainer_stats = trainer.train()
5 frames
/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
4274 metrics[f"{metric_key_prefix}_loss"] = np.concatenate(all_losses).mean().item()
4275 elif isinstance(all_losses, np.ndarray):
-> 4276 metrics[f"{metric_key_prefix}_loss"] = all_losses.mean().item()
4277 if hasattr(self, "jit_compilation_time"):
4278 metrics[f"{metric_key_prefix}_jit_compilation_time"] = self.jit_compilation_time
TypeError: 'NoneType' object does not support item assignment
The text was updated successfully, but these errors were encountered: