Model Loss functions becomes NaN when using multi-gpu #88
bmt621
started this conversation in
General discussions
Replies: 2 comments 1 reply
-
This is the output terminal of the model
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Hm, that's weird. Maybe try
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello guys,
I've been trying to debug this code for a week now, but to no aveil, I am running into loss values of NaN when I try to train my model on multi-gpu, but works fine on 1 gpu, I don't know where might be the problem, when I tried to peak into the model tensor outpus, I discovered that at some iterations, the dataset at second gpu outputs all NaN values but tensors at first GPU are all floating numbers.
please if there's any expert that can help I'll really appreciate.
I am dealing with text to speech synthesis datasets, the input is a token for the text, and the output is a mel-spectrogram which contains 80 number of mels, 1024 n_fft, and hop length of 256.
I want to perform an experiment on the librispeech datasets, however, if also any one wants to contribute to the repo I'm working on, should feel free and let me know.
Thanks.
This is my sample code, and if you need to get much more access to the code please I will be happy to share also.
Beta Was this translation helpful? Give feedback.
All reactions