Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sharderformer] Support zbv in Sharderformer Policy #6150

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

duanjunwen
Copy link
Member

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs
  • I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@duanjunwen duanjunwen requested a review from a team as a code owner November 21, 2024 09:40
@duanjunwen duanjunwen requested a review from ver217 November 21, 2024 10:58
@duanjunwen duanjunwen force-pushed the feature/sharderformer_support_zbv branch from ba7fc35 to 8cb74e7 Compare December 10, 2024 07:09
@duanjunwen duanjunwen force-pushed the feature/sharderformer_support_zbv branch from 3fd2402 to 70b0ae1 Compare December 10, 2024 08:50
@duanjunwen duanjunwen force-pushed the feature/sharderformer_support_zbv branch from 44b5786 to 37b670e Compare December 10, 2024 11:26
@@ -1020,3 +1202,158 @@ def forward(self, input_: Tensor) -> Tensor:
return output
else:
return output, self.bias


class FusedLinear1D(ParallelModule):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd better use a new name. This class does not use TP. Why name it "1D"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd better use a new name. This class does not use TP. Why name it "1D"?

Fixed in 25da23d

@@ -620,6 +634,154 @@ def forward(self, input_: Tensor) -> Tensor:
return output, self.bias


class GPT2FusedLinearConv1D(ParallelModule):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd better use a new name. This class does not use TP. Why name it "1D"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd better use a new name. This class does not use TP. Why name it "1D"?

Fixed in 25da23d

Comment on lines 534 to 540
# if self.pipeline_stage_manager.is_last_stage():
# multiple_choice_head = self.model.multiple_choice_head
# held_layers.append(self.model.lm_head)
# held_layers.append(multiple_choice_head.summary)
# held_layers.append(multiple_choice_head.activation)
# held_layers.append(multiple_choice_head.first_dropout)
# held_layers.append(multiple_choice_head.last_dropout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear useless comments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear useless comments

Removed.

Comment on lines 61 to 69
n_stage=pp_size,
n_micro=num_microbatches,
f_cost=1,
b_cost=1,
w_cost=1,
c_cost=1,
f_mem=mem_f,
b_mem=mem_b,
w_mem=mem_w,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a guide to introduce how to set these values

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a guide to introduce how to set these values

Updated in 25da23d.
Add detailed descriptions of the x_cost, x_mem parameters, and use cases.

@duanjunwen duanjunwen requested a review from ver217 December 18, 2024 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants