Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config refactor #83

Merged
merged 40 commits into from
Aug 14, 2024
Merged

Config refactor #83

merged 40 commits into from
Aug 14, 2024

Conversation

sordonia
Copy link
Member

  • Configs are now dataclasses
  • DataArgs, SelectorArgs, ModifierArgs and TransformArgs automatically read their fields from the registered classes through an ad-hoc metaclass
  • This can be considered the first step towards separating arguments such that Trainer and Model itself can use different inits

@sordonia sordonia requested review from matheper and pclucas14 August 12, 2024 15:25
Alessandro Sordoni added 2 commits August 12, 2024 08:35
expert_name: str = None

# Training config
micro_batch_size: str = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now would be a good time to clean up the mess that is micro_batch_size train_batch_size gradient_accumulation_steps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, do we have the three of them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alloa

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either we remove gradient_accumulation_steps (and compute it as train_batch_size // micro_batch_size or we remove micro_batch_size and train_batch_size becomes "per device train batch size"

Copy link
Member Author

@sordonia sordonia Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gradient_accumulation_steps is computed in post_init so it's not settable by the user, it's fine no? @pclucas14

Copy link
Member Author

@sordonia sordonia Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also per device train batch size doesn't tell you how many accumulation steps you need to do?

@sordonia sordonia requested a review from pclucas14 August 12, 2024 21:25
@sordonia
Copy link
Member Author

Added comments

@sordonia sordonia merged commit 6b83bb9 into main Aug 14, 2024
6 checks passed
@sordonia sordonia deleted the config-refactor branch August 14, 2024 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants