Maybe memory leak leak occurs after evaluation when using enable_liger_kernel
.
#6085
Open
1 task done
Labels
pending
This problem is yet to be addressed
Reminder
System Info
llamafactory==0.7.2.dev0
transformers==4.46.1
python==3.10.14
Reproduction
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml # or gemma
Expected behavior
Thank you for sharing such an amazing project. @hiyouga
When I used
enable_liger_kernel: true
for training, the training memory usage of the Gemma2 model dropped from around 60 GiB to 7 GiB.However, after running evaluation, the memory usage jumps to 60 GiB, and even when resuming training, it doesn't return to the previous memory level, staying at 60 GiB instead. It seems like there might be a memory leak somewhere.
Others
No response
The text was updated successfully, but these errors were encountered: