We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
如图,当Batchsize为8的时候能够正常训练,但是当调大为16及以上后就会报错。使用的是特斯拉V100显存32G,理论上调到80都是够用的。
The text was updated successfully, but these errors were encountered:
调试后定位到错误,当调大Batchsize后,会出现数据为Nan的情况,定位到ShuffleNetv2的3D卷积的位置,在进行卷积运算之后就会报错
Sorry, something went wrong.
发现现在batchsize为8的时候也会报错了,再定位发现有些输入数据非常多0,是我的数据加载有问题吗?
你好,请问,请问为什么进行DDP训练损失异常的大,但是进行单卡训练没有这种情况。谢谢解答
我也不太清楚,那就尽量单卡训练呗,其实训练时间也不算长
No branches or pull requests
如图,当Batchsize为8的时候能够正常训练,但是当调大为16及以上后就会报错。使用的是特斯拉V100显存32G,理论上调到80都是够用的。
The text was updated successfully, but these errors were encountered: