Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpointing support for transformer type models #247

Merged
merged 72 commits into from
Feb 25, 2025
Merged

Conversation

zhenghh04
Copy link
Member

@zhenghh04 zhenghh04 commented Feb 5, 2025

In this PR, we addressed the issue that people have to manually input layer parameters and optimization groups in the checkpointing. #248

  • People are allowed to input just "vocab_size, hidden_size, ffn_hidden_size, num_layers", the layer parameters and optimization groups are calculated internally.
  • We also added option for specifying zero_stage. The default value is 0, where no zero is used.
  • We added several YAML configuration files for Llama transformer models.
  • We allow people to specify the datatype for writing data.
  • We added checkpoint only mode
  • We added support for checkpoint recovery tests

@zhenghh04 zhenghh04 added the enhancement New feature or request label Feb 7, 2025
@zhenghh04
Copy link
Member Author

@hariharan-devarajan ready for you to review again.

I added two other features since last time we talked:

  • recovery support (read)
  • added checkpoint only (i.e., turning off train)

Copy link
Collaborator

@hariharan-devarajan hariharan-devarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there.

if self.args.hidden_size <= 0:
return 0
head_size = self.args.hidden_size//self.args.num_attention_heads
dim_kv = head_size * self.args.num_kv_heads
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed this. dim_kv

mlp_4h_to_h = self.args.ffn_hidden_size*self.args.hidden_size
weight = self.args.hidden_size
lm_head = embedding
return embedding + (input_norm + qkv + dense + layer_norm + mlp_h_to_4h + mlp_4h_to_h)*self.args.num_layers + weight + lm_head
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is qkv and mlp_h_to_4h? if mlp_h_to_4h is too big then at least add a line comment on what it is, when it is defined.


def get_layer_parameters(self, layer_index):
head_size = self.args.hidden_size//self.args.num_attention_heads
dim_kv = head_size * self.args.num_kv_heads
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full form dim_kv

Copy link
Collaborator

@hariharan-devarajan hariharan-devarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thank you for all the changes.

@zhenghh04 zhenghh04 merged commit 67f0fbf into main Feb 25, 2025
12 checks passed
@zhenghh04 zhenghh04 deleted the feature/transformer branch February 25, 2025 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants