没有保存checkpoint_e20.pth #83

LearnByDoingXW · 2022-01-07T08:12:06Z

在完整训练完成之后，最后一个epoch虽然经过了训练，但是模型并没有得到保存，请问这个问题怎么解决，感谢回复！

AgSword · 2022-09-06T02:57:59Z

在完整训练完成之后，训练最后一个时期虽然已经过去，但是并没有保存，请问这个问题如何解决，谢谢！

請問你解決了嗎

WangJun-ZJUT · 2022-10-11T08:44:55Z

您好，这是由于batchsize无法被样本总量整除导致的。

---原始邮件--- 发件人: ***@***.***> 发送时间: 2022年9月6日(周二) 上午10:58 收件人: ***@***.***>; 抄送: ***@***.***>; 主题: Re: [ohhhyeahhh/SiamCAR] 没有保存checkpoint_e20.pth (Issue #83) 在完整训练完成之后，训练最后一个时期虽然已经过去，但是并没有保存，请问这个问题如何解决，谢谢！請問你解決了嗎 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

dhy1222 · 2023-03-30T02:11:42Z

您好，这是由于batchsize无法被样本总量整除导致的。
…
---原始邮件--- 发件人: @.> 发送时间: 2022年9月6日(周二) 上午10:58 收件人: @.>; 抄送: @.>; 主题: Re: [ohhhyeahhh/SiamCAR] 没有保存checkpoint_e20.pth (Issue #83) 在完整训练完成之后，训练最后一个时期虽然已经过去，但是并没有保存，请问这个问题如何解决，谢谢！請問你解決了嗎 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.>

您好，我在训练的时候也出现了这个情况，batchsize设置的96，请问要得到最后一轮的模型要怎么办啊？

LIUYellowBlack · 2024-09-11T02:27:34Z

在train.py中，保存模型是进入到这一行，所以我们查看if num_iter % checkpoint_after == 0:中num_iter和checkpoint_after是什么，得到了下面这个图片

你可以看到其中，num_iter与checkpoint_after=的对应数值：说明要num_iter整除checkpoint_after=才能保存模型，在这里你可以找到checkpoint_after=的数值

你可以在这里checkpoint_after处的default修改成500，这样的话，就只是进行到第500次训练就可以保存模型了，或者按照你想要的方式进行保存

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

没有保存checkpoint_e20.pth #83

没有保存checkpoint_e20.pth #83

LearnByDoingXW commented Jan 7, 2022

AgSword commented Sep 6, 2022

WangJun-ZJUT commented Oct 11, 2022 via email

dhy1222 commented Mar 30, 2023

LIUYellowBlack commented Sep 11, 2024

没有保存checkpoint_e20.pth #83

没有保存checkpoint_e20.pth #83

Comments

LearnByDoingXW commented Jan 7, 2022

AgSword commented Sep 6, 2022

WangJun-ZJUT commented Oct 11, 2022 via email

dhy1222 commented Mar 30, 2023

LIUYellowBlack commented Sep 11, 2024