question regarding training stability #97

lyccol · 2023-12-25T08:34:18Z

I have a question regarding training stability. I downloaded the complete dataset of Redpajama v1 from Hugging Face and followed the parameter settings from the Llama1 paper for data mixture and model tuning. I trained two model sizes, 1.8B and 7B. Unfortunately, the 7B model experienced a rise in loss after 300 billion tokens, and the 1.8B model showed a similar increase after 250 billion tokens. How can I address this issue of training instability?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question regarding training stability #97

question regarding training stability #97

lyccol commented Dec 25, 2023

question regarding training stability #97

question regarding training stability #97

Comments

lyccol commented Dec 25, 2023