You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have encountered an issue while trying to run multiple epochs during training. When I set the max_train_steps value greater than the number of steps produced in one epoch, the training process exits unexpectedly.
Problem
It seems that the training logic only runs the train_one_epoch function once, leading to a situation where the model does not iterate through multiple epochs as expected. Instead, it terminates early when reaching the max_train_steps, without continuing for the specified number of epochs.
I would appreciate any insights or suggestions regarding this proposed change. If there are additional considerations or implications for implementing this, please let me know.
The text was updated successfully, but these errors were encountered:
Description
I have encountered an issue while trying to run multiple epochs during training. When I set the
max_train_steps
value greater than the number of steps produced in one epoch, the training process exits unexpectedly.Problem
It seems that the training logic only runs the
train_one_epoch
function once, leading to a situation where the model does not iterate through multiple epochs as expected. Instead, it terminates early when reaching themax_train_steps
, without continuing for the specified number of epochs.Current Code Snippet
The current implementation appears as follows:
Proposed Solution
To address this limitation, I suggest modifying the training logic to allow for multiple epochs explicitly. Below is my proposed change:
I would appreciate any insights or suggestions regarding this proposed change. If there are additional considerations or implications for implementing this, please let me know.
The text was updated successfully, but these errors were encountered: