[Model] DeepseekV2 Support #499

saurabhkoshatwar · 2024-12-26T00:58:13Z

Summary

Resolves #129 Add monkeypatch to support deepseepV2 model.

Details

Ops patched:

rms_norm
swiglu
cross_entropy
fused_linear_cross_entropy

Testing Done

Hardware Type: NVIDIA A100-SXM4-40GB
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

Add deepseekv2 convergence test

…war/Liger-Kernel into feature/deepseekv2

saurabhkoshatwar · 2025-01-07T01:40:16Z

@ByronHsu @yundai424 @Tcc0403 @qingquansong
As discussed in the issue, the rope implementation is different in DeepSeek.

deepseek:

    cos = cos[position_ids].unsqueeze(unsqueeze_dim)
    sin = sin[position_ids].unsqueeze(unsqueeze_dim)

    b, h, s, d = q.shape
    q = q.view(b, h, s, d // 2, 2).transpose(4, 3).reshape(b, h, s, d)

    b, h, s, d = k.shape
    k = k.view(b, h, s, d // 2, 2).transpose(4, 3).reshape(b, h, s, d)

    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
    return q_embed, k_embed

llama:

    cos = cos.unsqueeze(unsqueeze_dim)
    sin = sin.unsqueeze(unsqueeze_dim)
    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
    return q_embed, k_embed`

I will create a separate PR to implement the DeepSeek rope.

tyler-romero · 2025-01-15T17:23:25Z

src/liger_kernel/transformers/monkey_patch.py

+        modeling_mod.DeepseekV2MLP.forward = LigerSwiGLUMLP.forward
+    if cross_entropy:
+        if transformer_version >= version.parse(SUPPORTED_TRANSFORMER_VERSION):
+            from transformers.loss.loss_utils import nn


nit: since its so common to use import torch.nn as nn, perhaps we import loss_utils under a different symbol?

Maybe even just from transformers.loss import loss_utils?

tyler-romero · 2025-01-15T17:25:08Z

src/liger_kernel/transformers/monkey_patch.py

+    import sys
+
+    # Ensure the model is a DeepSeek model
+    if "deepseek" not in model.__class__.__module__:


Do deepseek and deepseek-v3 share the same architecture? If so, perhaps this function should be called apply_liger_kernel_to_deepseek, if not, perhaps we should strengthen this check.

tyler-romero · 2025-01-15T17:27:00Z

src/liger_kernel/transformers/monkey_patch.py

+        if transformer_version >= version.parse(SUPPORTED_TRANSFORMER_VERSION):
+            from transformers.loss.loss_utils import nn
+
+            nn.functional.cross_entropy = liger_cross_entropy


This will globally patch from transformers.loss.loss_utils.function.cross_entropy, which is a pretty undersireable / unexpected side effect of applying this deepseek-specific monkey patch.

See this issue: #315

This could be fixed if deepseekv2 is added to the transformers library (see below comment about trust_remote_code)

heavy plus^

@yundai424 @tyler-romero I agree.
I think we should create a separate PR to refactor the monkey_patch file and fix this for all models.

My understanding is that the fix needs to be made in the transformers library - its because instead of importing an individual component they import an entire module, which makes it tough to monkeypatch without global side effects.

tyler-romero · 2025-01-15T17:29:55Z

test/convergence/test_mini_models.py

+    if model_name[:6] == "remote":
+        revert_kwargs["remote_model_module"] = MINI_MODEL_SETUPS[model_name].remote_model_module
+
+    model = create_model(model_name).to(dtype).to(device)


Why the change to create the model before applying the patch?

tyler-romero · 2025-01-15T17:36:17Z

test/convergence/test_mini_models.py

-    model_class = MINI_MODEL_SETUPS[model_name].model_class
-    return model_class(model_config)
+    if model_name[:6] == "remote":
+        config = AutoConfig.from_pretrained(MINI_MODEL_SETUPS[model_name].remote_model_path, trust_remote_code=True)


Can you explain why this is necessary? Its it because the model cannot be run without trust_remote_code? As is, this default opts-in anyone who runs these unit tests into running remote code on their machine, which is a red flag.

I think a preferable path would be to add deepseekv2 to the transformers library, then add it to Liger, so that trust_remote_code is not necessary.

This also has the benefit of making it easier to follow changes that are made to the underlying model, which is a common source of bugs in Liger.

It looks like support for deepseekv2 is underway (maybe stalled though): huggingface/transformers#31976

yundai424 · 2025-01-16T23:58:55Z

src/liger_kernel/transformers/model/deepseekv2.py

+
+DeepseekV2_INPUTS_DOCSTRING = r"""


nit: it'll be helpful to document where this part of docstring is ported from -- at least need a link to https://huggingface.co/deepseek-ai/DeepSeek-V2/blob/main/modeling_deepseek.py

initial patch code and test

c62736a

saurabhkoshatwar marked this pull request as draft December 26, 2024 00:58

saurabhkoshatwar and others added 5 commits December 31, 2024 11:26

Add deepseekv2 convergence test

f58471d

Add deepseekv2 convergence test

test fix

e28dc49

Merge branch 'feature/deepseekv2' of https://github.com/saurabhkoshat…

1a7efbf

…war/Liger-Kernel into feature/deepseekv2

Add test without logits

adfc644

checkstyle fixes

f1310e1

saurabhkoshatwar marked this pull request as ready for review December 31, 2024 20:42

saurabhkoshatwar and others added 2 commits December 31, 2024 12:43

Merge branch 'main' into feature/deepseekv2

a76931a

fused lce fix

0a17f0b

Merge branch 'main' into feature/deepseekv2

b6287f1

tyler-romero reviewed Jan 15, 2025

View reviewed changes

yundai424 reviewed Jan 16, 2025

View reviewed changes

lancerts added 2 commits January 21, 2025 15:39

Merge branch 'main' into feature/deepseekv2

8e71b13

Merge branch 'main' into feature/deepseekv2

2b5e749

saurabhkoshatwar requested a review from yundai424 January 22, 2025 19:26

add docstring source link

a34774d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] DeepseekV2 Support #499

[Model] DeepseekV2 Support #499

saurabhkoshatwar commented Dec 26, 2024 •

edited

Loading

saurabhkoshatwar commented Jan 7, 2025 •

edited

Loading

tyler-romero Jan 15, 2025

tyler-romero Jan 15, 2025 •

edited

Loading

tyler-romero Jan 15, 2025 •

edited

Loading

tyler-romero Jan 15, 2025

yundai424 Jan 16, 2025

saurabhkoshatwar Jan 21, 2025

tyler-romero Jan 22, 2025

tyler-romero Jan 15, 2025

tyler-romero Jan 15, 2025 •

edited

Loading

tyler-romero Jan 15, 2025

yundai424 Jan 16, 2025


		DeepseekV2_INPUTS_DOCSTRING = r"""

[Model] DeepseekV2 Support #499

Are you sure you want to change the base?

[Model] DeepseekV2 Support #499

Conversation

saurabhkoshatwar commented Dec 26, 2024 • edited Loading

Summary

Details

Testing Done

saurabhkoshatwar commented Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

tyler-romero Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

tyler-romero Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tyler-romero Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saurabhkoshatwar commented Dec 26, 2024 •

edited

Loading

saurabhkoshatwar commented Jan 7, 2025 •

edited

Loading

tyler-romero Jan 15, 2025 •

edited

Loading

tyler-romero Jan 15, 2025 •

edited

Loading

tyler-romero Jan 15, 2025 •

edited

Loading