-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Config refactor #83
Config refactor #83
Conversation
sordonia
commented
Aug 12, 2024
- Configs are now dataclasses
- DataArgs, SelectorArgs, ModifierArgs and TransformArgs automatically read their fields from the registered classes through an ad-hoc metaclass
- This can be considered the first step towards separating arguments such that Trainer and Model itself can use different inits
expert_name: str = None | ||
|
||
# Training config | ||
micro_batch_size: str = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now would be a good time to clean up the mess that is micro_batch_size
train_batch_size
gradient_accumulation_steps
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, do we have the three of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alloa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either we remove gradient_accumulation_steps
(and compute it as train_batch_size // micro_batch_size
or we remove micro_batch_size
and train_batch_size
becomes "per device train batch size"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gradient_accumulation_steps
is computed in post_init so it's not settable by the user, it's fine no? @pclucas14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also per device train batch size doesn't tell you how many accumulation steps you need to do?
Added comments |