We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
您好,我把一些中间变量的shape打出来看了下,有个地方不太明白 我的理解是这样的,只讨论tgt部分,300维是可学习的编码,然后pad部分是存放添加了噪声的label
如此图,batchsize为2,两张图片的label数量分别为4和16,然后噪声label的tensor经过repeat scalar次后shape变为20×5=100 但是pad_size只设置为known_num的最大值的话,pad部分大小为16×5=80. 那这样的话新的tgt大小为380,但是噪声label是100,会占用掉非去噪部分的20
当然如果按您给的训练参数batch_size=1的话不会存在这个问题,但是batch_size为1有点慢,针对batchsize>1可否设置成pad_size=sum(known_num)呢,这里的改动会影响整个模型的性能吗。谢谢。
The text was updated successfully, but these errors were encountered:
您好,我们的实现是可以支持batchsize>1的情况的。您可以设置batchsize=2在这里debug一下应该就明白原理了。
Sorry, something went wrong.
No branches or pull requests
您好,我把一些中间变量的shape打出来看了下,有个地方不太明白
我的理解是这样的,只讨论tgt部分,300维是可学习的编码,然后pad部分是存放添加了噪声的label
如此图,batchsize为2,两张图片的label数量分别为4和16,然后噪声label的tensor经过repeat scalar次后shape变为20×5=100
但是pad_size只设置为known_num的最大值的话,pad部分大小为16×5=80.
那这样的话新的tgt大小为380,但是噪声label是100,会占用掉非去噪部分的20
当然如果按您给的训练参数batch_size=1的话不会存在这个问题,但是batch_size为1有点慢,针对batchsize>1可否设置成pad_size=sum(known_num)呢,这里的改动会影响整个模型的性能吗。谢谢。
The text was updated successfully, but these errors were encountered: