[Sharderformer] Support zbv in Sharderformer Policy #6150

duanjunwen · 2024-11-21T09:40:00Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs
I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

falcon,gptj,mistral,opt,qwen2,t5, vit, whisper

Col and Row.

base testcase;

for more information, see https://pre-commit.ci

ver217 · 2024-12-18T03:41:36Z

colossalai/shardformer/layer/qkv_fused_linear.py

@@ -1020,3 +1202,158 @@ def forward(self, input_: Tensor) -> Tensor:
            return output
        else:
            return output, self.bias
+
+
+class FusedLinear1D(ParallelModule):


You'd better use a new name. This class does not use TP. Why name it "1D"?

You'd better use a new name. This class does not use TP. Why name it "1D"?

Fixed in 25da23d

ver217 · 2024-12-18T03:41:46Z

colossalai/shardformer/layer/qkv_fused_linear.py

@@ -620,6 +634,154 @@ def forward(self, input_: Tensor) -> Tensor:
            return output, self.bias


+class GPT2FusedLinearConv1D(ParallelModule):


You'd better use a new name. This class does not use TP. Why name it "1D"?

You'd better use a new name. This class does not use TP. Why name it "1D"?

Fixed in 25da23d

ver217 · 2024-12-18T03:42:58Z

colossalai/shardformer/policies/gpt2.py

+            # if self.pipeline_stage_manager.is_last_stage():
+            #     multiple_choice_head = self.model.multiple_choice_head
+            #     held_layers.append(self.model.lm_head)
+            #     held_layers.append(multiple_choice_head.summary)
+            #     held_layers.append(multiple_choice_head.activation)
+            #     held_layers.append(multiple_choice_head.first_dropout)
+            #     held_layers.append(multiple_choice_head.last_dropout)


clear useless comments

clear useless comments

Removed.

ver217 · 2024-12-18T03:45:07Z

docs/source/en/features/zerobubble_pipeline_parallelism.md

+    n_stage=pp_size,
+    n_micro=num_microbatches,
+    f_cost=1,
+    b_cost=1,
+    w_cost=1,
+    c_cost=1,
+    f_mem=mem_f,
+    b_mem=mem_b,
+    w_mem=mem_w,


We should have a guide to introduce how to set these values

We should have a guide to introduce how to set these values

Updated in 25da23d.
Add detailed descriptions of the x_cost, x_mem parameters, and use cases.

[feat] Sharderformer support zbv

b31a052

duanjunwen requested a review from a team as a code owner November 21, 2024 09:40

[feat] support chatglm2, command, deepseek for zbv

5f89e7f

duanjunwen requested a review from ver217 November 21, 2024 10:58

duanjunwen added 8 commits November 22, 2024 05:38

[feat] support zbv in shardformer policy:

41e1972

falcon,gptj,mistral,opt,qwen2,t5, vit, whisper

Merge branch 'main' into feature/sharderformer_support_zbv

37a5a66

[feat] support GPT2FusedLinearConv1D

efffe6b

Merge branch 'main' into feature/sharderformer_support_zbv

2b94e00

[feat] support GPT2FusedLinear (without tp)

a84fc41

[fix] debug FusedConvLinear

014cc27

[shardfromer] support gpt2 policy for zbv, support GPT2FusedLinearConv

778d4df

Col and Row.

Merge branch 'main' into feature/sharderformer_support_zbv

8cb74e7

duanjunwen force-pushed the feature/sharderformer_support_zbv branch from ba7fc35 to 8cb74e7 Compare December 10, 2024 07:09

duanjunwen added 4 commits December 10, 2024 15:30

[Shardformer] support FusedLinear1D base for zbv

d168b73

[shardformer] support zbv in FusedLinear1D base, Col, Row

01a9cb3

[shardformer] support zbv in blip2 and sam policy

fc77b24

[shardformer] fix bug incorrect number of gradients; add fusedLinear

70b0ae1

base testcase;

duanjunwen force-pushed the feature/sharderformer_support_zbv branch from 3fd2402 to 70b0ae1 Compare December 10, 2024 08:50

[fix] fix incorrect number of gradients ;

37b670e

duanjunwen force-pushed the feature/sharderformer_support_zbv branch from 44b5786 to 37b670e Compare December 10, 2024 11:26

pre-commit-ci bot and others added 5 commits December 10, 2024 11:27

[pre-commit.ci] auto fixes from pre-commit.com hooks

94bb9ec

for more information, see https://pre-commit.ci

[Shardformer] add en doc for zbv;

dee1878

[fix] fix typo in Model compatibility table

83e670e

[fix] fix API Reference typo

2a55566

[Shardformer] add zh-Han doc for zbv

5430eb0

ver217 reviewed Dec 18, 2024

View reviewed changes

[fix] fix Linear name; update en & zh doc

25da23d

duanjunwen requested a review from ver217 December 18, 2024 06:02

duanjunwen added 2 commits December 18, 2024 14:32

[fix] fix shardformer doc import err

fd5bd33

[fix] fix shardconfig import in doc

c749a7c

duanjunwen added 9 commits December 18, 2024 14:50

[fix] fix shardformer doc

eba4e33

[fix] fix shardconfig doc

3c5ce9e

[fix] fix config

6bbe666

[fix] remove shardconfig

3946366

[fix] fix doc

b99c733

[feat] add zbv doc string

99a7829

[fix] rm doc

f67ce86

[fix] fix doc

bbdcca1

[fix] empty zbv doc

9665f66

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Sharderformer] Support zbv in Sharderformer Policy #6150

[Sharderformer] Support zbv in Sharderformer Policy #6150

duanjunwen commented Nov 21, 2024

ver217 Dec 18, 2024

duanjunwen Dec 18, 2024

ver217 Dec 18, 2024

duanjunwen Dec 18, 2024

ver217 Dec 18, 2024

duanjunwen Dec 18, 2024

ver217 Dec 18, 2024

duanjunwen Dec 18, 2024

		@@ -620,6 +634,154 @@ def forward(self, input_: Tensor) -> Tensor:
		return output, self.bias


		class GPT2FusedLinearConv1D(ParallelModule):

[Sharderformer] Support zbv in Sharderformer Policy #6150

Are you sure you want to change the base?

[Sharderformer] Support zbv in Sharderformer Policy #6150

Conversation

duanjunwen commented Nov 21, 2024

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment