Fix (scaling/standalone): better switch from runtime stats to param #1099
+12
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reason for this PR
Currently, if we switch from training to eval before
stats_collection_steps
is done, we never update thevalue
parameter to store the buffer value. This has a few side effects:value
, causing some of the issues mentioned above.Changes Made in this PR
At eval time, during the first iteration the buffer is always converted to param.The side effect of this happens in the case the user would want to switch multiple times between training/evaluation mode very early on in the training process. Although it is common to switch between training/eval to check loss on the validation set, it is usually done after enough iteration that the buffer has already been converted to parameter anyway.
I'd admit that it could be marked as breaking change for this edge cases.
This has been removed in a more recent commit. I believe there are no more breaking changes at this point.All fixed, no more breaking changes.
After calibration, we forcefully convert the buffer to parameters.
Testing Summary
Risk Highlight
Checklist
dev
branch.