You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear all,
I ran finetuning and while validating, I encountered this error message:
iter 3198: loss nan, time: 123.08ms
Validating ...
.......
lit-llama/generate.py", line 74, in generate
idx_next = torch.multinomial(probs, num_samples=1).to(dtype=dtype)
RuntimeError: probability tensor contains either inf, nan or element < 0
Could you tell me how I can solve this problem?
Thanks in advance.
The text was updated successfully, but these errors were encountered:
It may or may not be related, but are you using --precision 16-true? I noticed that for training some models it results in NaNs during training. If your GPU supports it, can you try brain float precision, i.e. --precision bf16-true?
Dear all,
I ran finetuning and while validating, I encountered this error message:
iter 3198: loss nan, time: 123.08ms
Validating ...
.......
lit-llama/generate.py", line 74, in generate
idx_next = torch.multinomial(probs, num_samples=1).to(dtype=dtype)
RuntimeError: probability tensor contains either
inf
,nan
or element < 0Could you tell me how I can solve this problem?
Thanks in advance.
The text was updated successfully, but these errors were encountered: