Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question regarding training stability #97

Open
lyccol opened this issue Dec 25, 2023 · 0 comments
Open

question regarding training stability #97

lyccol opened this issue Dec 25, 2023 · 0 comments

Comments

@lyccol
Copy link

lyccol commented Dec 25, 2023

I have a question regarding training stability. I downloaded the complete dataset of Redpajama v1 from Hugging Face and followed the parameter settings from the Llama1 paper for data mixture and model tuning. I trained two model sizes, 1.8B and 7B. Unfortunately, the 7B model experienced a rise in loss after 300 billion tokens, and the 1.8B model showed a similar increase after 250 billion tokens. How can I address this issue of training instability?

1 8B
7B

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant