post train without fp16 #22

Weiting-Gao · 2021-08-26T03:27:37Z

Thanks for your work!
I tried to post train the Bert base model using my own data. I encountered some problem when using fp16 (CUDA error: invalid configuration argument), so I tried to train without fp16. However, by doing so, the batch loss are all nan. Do you have any idea about this problem, is it because I didn't use fp16? Thank you!

howardhsu · 2021-08-26T07:24:49Z

the memory consumption is impractical if not use fp16. sorry, the code is not well tested on fp32. make sure gpu is volta or ampire or rtx.

On Wed, Aug 25, 2021 at 8:27 PM WeitingGG ***@***.***> wrote: Thanks for your work! I tried to post train the Bert base model using my own data. I encountered some problem when using fp16 (CUDA error: invalid configuration argument), so I tried to train without fp16. However, by doing so, the batch loss are all nan. Do you have any idea about this problem, is it because I didn't use fp16? Thank you! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#22>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACRK374BNP4YP4KHXO2ALS3T6WYDHANCNFSM5C2NNBDQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

-- Homepage: https://howardhsu.github.io/ Linkedin: https://www.linkedin.com/in/hu-xu-9852403b/ Google Scholar: https://scholar.google.com/citations?user=SaH2yWMAAAAJ Twitter: https://twitter.com/Hu_Hsu Email: ***@***.***

Weiting-Gao · 2021-08-26T19:08:25Z

Thanks for your reply!
I noticed that you mentioned "It is possible to avoid use GPUs that do not support apex (e.g., 1080 Ti), but need to adjust the max sequence length and number of gradient accumulation but (although the result can be better)." in the instruction.
I directly set fp16==False to avoid using apex, but as I said, the batch loss are all nan. It doesn't seem the correct way to do it.
I wonder how to change the code to avoid use GPUs that do not support apex correctly? Thanks!

howardhsu · 2021-08-26T22:38:46Z

it’s recommended to plug-in a standard trainer hugging face or pytorch lightning. fp32 are not well tested and some fp16 feature may not be fully disabled.

On Thu, Aug 26, 2021 at 12:08 PM WeitingGG ***@***.***> wrote: Thanks for your reply! I noticed that you mentioned "It is possible to avoid use GPUs that do not support apex (e.g., 1080 Ti), but need to adjust the max sequence length and number of gradient accumulation but (although the result can be better)." in the instruction. I directly set fp16==False to avoid using apex, but as I said, the batch loss are all nan. It doesn't seem the correct way to do it. I wonder how to change the code to avoid use GPUs that do not support apex correctly? Thanks! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACRK377IMRZ7NXGAQAUWTH3T62GLJANCNFSM5C2NNBDQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

-- Homepage: https://howardhsu.github.io/ Linkedin: https://www.linkedin.com/in/hu-xu-9852403b/ Google Scholar: https://scholar.google.com/citations?user=SaH2yWMAAAAJ Twitter: https://twitter.com/Hu_Hsu Email: ***@***.***

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

post train without fp16 #22

post train without fp16 #22

Weiting-Gao commented Aug 26, 2021

howardhsu commented Aug 26, 2021 via email

Weiting-Gao commented Aug 26, 2021

howardhsu commented Aug 26, 2021 via email

post train without fp16 #22

post train without fp16 #22

Comments

Weiting-Gao commented Aug 26, 2021

howardhsu commented Aug 26, 2021 via email

Weiting-Gao commented Aug 26, 2021

howardhsu commented Aug 26, 2021 via email