High Initial Loss When Fine-Tuning Gemma Model #38

zhangzc21 · 2025-01-24T06:52:57Z

Dear authors,

Thans for your great works!

I am currently trying to fine-tune the Gemma-7B model using PiSSA, but I am encountering an issue where the initial loss and grad norm are extremely high.

This doesn't seem to be cuased by the pissa algorithm, since using LoRA to fine-tune Gemma-7B also has similar problem.

Do you have encounted this question, or have any ideas on how to solve it? Thanks a lot!

Peng-YM · 2025-02-11T03:37:11Z

+1, I have also encountered the same issue when fine-tuning the Qwen-2.5-7B model. The initial loss is approximately an order of magnitude higher than that of the standard LoRA method. For instance, while the standard LoRA achieves an initial loss of around 0.6, PiSSA exhibits an initial loss of approximately 6 or higher.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High Initial Loss When Fine-Tuning Gemma Model #38

High Initial Loss When Fine-Tuning Gemma Model #38

zhangzc21 commented Jan 24, 2025

Peng-YM commented Feb 11, 2025 •

edited

Loading

High Initial Loss When Fine-Tuning Gemma Model #38

High Initial Loss When Fine-Tuning Gemma Model #38

Comments

zhangzc21 commented Jan 24, 2025

Peng-YM commented Feb 11, 2025 • edited Loading

Peng-YM commented Feb 11, 2025 •

edited

Loading