Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

没有保存checkpoint_e20.pth #83

Open
LearnByDoingXW opened this issue Jan 7, 2022 · 4 comments
Open

没有保存checkpoint_e20.pth #83

LearnByDoingXW opened this issue Jan 7, 2022 · 4 comments

Comments

@LearnByDoingXW
Copy link

在完整训练完成之后,最后一个epoch虽然经过了训练,但是模型并没有得到保存,请问这个问题怎么解决,感谢回复!

@AgSword
Copy link

AgSword commented Sep 6, 2022

在完整训练完成之后,训练最后一个时期虽然已经过去,但是并没有保存,请问这个问题如何解决,谢谢!

請問你解決了嗎

@WangJun-ZJUT
Copy link
Collaborator

WangJun-ZJUT commented Oct 11, 2022 via email

@dhy1222
Copy link

dhy1222 commented Mar 30, 2023

您好,这是由于batchsize无法被样本总量整除导致的。

---原始邮件--- 发件人: @.> 发送时间: 2022年9月6日(周二) 上午10:58 收件人: @.>; 抄送: @.>; 主题: Re: [ohhhyeahhh/SiamCAR] 没有保存checkpoint_e20.pth (Issue #83) 在完整训练完成之后,训练最后一个时期虽然已经过去,但是并没有保存,请问这个问题如何解决,谢谢! 請問你解決了嗎 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.>

您好,我在训练的时候也出现了这个情况,batchsize设置的96,请问要得到最后一轮的模型要怎么办啊?

@LIUYellowBlack
Copy link

image
在train.py中,保存模型是进入到这一行,所以我们查看if num_iter % checkpoint_after == 0:中num_iter和checkpoint_after是什么,得到了下面这个图片
1726021354964
你可以看到其中,num_iter与checkpoint_after=的对应数值:说明要num_iter整除checkpoint_after=才能保存模型,在这里你可以找到checkpoint_after=的数值
image

你可以在这里checkpoint_after处的default修改成500,这样的话,就只是进行到第500次训练就可以保存模型了,或者按照你想要的方式进行保存

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants