Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Initial Loss When Fine-Tuning Gemma Model #38

Open
zhangzc21 opened this issue Jan 24, 2025 · 1 comment
Open

High Initial Loss When Fine-Tuning Gemma Model #38

zhangzc21 opened this issue Jan 24, 2025 · 1 comment

Comments

@zhangzc21
Copy link

Dear authors,

Thans for your great works!

I am currently trying to fine-tune the Gemma-7B model using PiSSA, but I am encountering an issue where the initial loss and grad norm are extremely high.

This doesn't seem to be cuased by the pissa algorithm, since using LoRA to fine-tune Gemma-7B also has similar problem.

Do you have encounted this question, or have any ideas on how to solve it? Thanks a lot!

@Peng-YM
Copy link

Peng-YM commented Feb 11, 2025

+1, I have also encountered the same issue when fine-tuning the Qwen-2.5-7B model. The initial loss is approximately an order of magnitude higher than that of the standard LoRA method. For instance, while the standard LoRA achieves an initial loss of around 0.6, PiSSA exhibits an initial loss of approximately 6 or higher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants