Support Deepseek v32 #4026

grimoire · 2025-10-08T07:11:04Z

requirements:
https://github.com/Dao-AILab/fast-hadamard-transform#
latest FlashMLA

Note: My bitonic topk kernel would failed on triton<=3.2.0. I would try to fix it. Upgrading our requirements would be a better option.

lmdeploy/pytorch/backends/cuda/nsa.py

lmdeploy/pytorch/backends/cuda/moe.py

lmdeploy/pytorch/engine/cache_engine.py

lvhan028 · 2025-11-01T14:55:09Z

Following the fast-hadamard-transform installation guide but failed. May kindly share the installation method

(lmdeploy-py312) [lvhan@pj-h800-013 fast-hadamard-transform]$ pip install -v .
Using pip 25.2 from /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/pip (python 3.12)
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Processing /nvme1/lvhan/fast-hadamard-transform
  Running command python setup.py egg_info
  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
    import pynvml  # type: ignore[import]
  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated.
  !!

          ********************************************************************************
          Please consider removing the following classifiers in favor of a SPDX license expression:

          License :: OSI Approved :: BSD License

          See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
          ********************************************************************************

  !!
    self._finalize_license_expression()


  torch.__version__  = 2.8.0+cu128


  running egg_info
  creating /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info
  writing /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/dependency_links.txt
  writing requirements to /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/requires.txt
  writing top-level names to /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/top_level.txt
  writing manifest file '/tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/SOURCES.txt'
  reading manifest file '/tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/SOURCES.txt'
  adding license file 'LICENSE'
  adding license file 'AUTHORS'
  writing manifest file '/tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/SOURCES.txt'
  Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from fast_hadamard_transform==1.0.4.post1) (2.8.0+cu128)
Requirement already satisfied: packaging in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from fast_hadamard_transform==1.0.4.post1) (25.0)
Requirement already satisfied: ninja in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from fast_hadamard_transform==1.0.4.post1) (1.13.0)
Requirement already satisfied: filelock in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (3.20.0)
Requirement already satisfied: typing-extensions>=4.10.0 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (4.15.0)
Requirement already satisfied: setuptools in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (80.9.0)
Requirement already satisfied: sympy>=1.13.3 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (1.14.0)
Requirement already satisfied: networkx in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (3.5)
Requirement already satisfied: jinja2 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (3.1.6)
Requirement already satisfied: fsspec in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (2025.10.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.8.93 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.93)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.8.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.90)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.8.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.90)
Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (9.10.2.21)
Requirement already satisfied: nvidia-cublas-cu12==12.8.4.1 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.4.1)
Requirement already satisfied: nvidia-cufft-cu12==11.3.3.83 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (11.3.3.83)
Requirement already satisfied: nvidia-curand-cu12==10.3.9.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (10.3.9.90)
Requirement already satisfied: nvidia-cusolver-cu12==11.7.3.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (11.7.3.90)
Requirement already satisfied: nvidia-cusparse-cu12==12.5.8.93 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.5.8.93)
Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (0.7.1)
Requirement already satisfied: nvidia-nccl-cu12==2.27.3 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (2.27.3)
Requirement already satisfied: nvidia-nvtx-cu12==12.8.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.90)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.8.93 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.93)
Requirement already satisfied: nvidia-cufile-cu12==1.13.1.3 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (1.13.1.3)
Requirement already satisfied: triton==3.4.0 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (3.4.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from sympy>=1.13.3->torch->fast_hadamard_transform==1.0.4.post1) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from jinja2->torch->fast_hadamard_transform==1.0.4.post1) (3.0.3)
Building wheels for collected packages: fast_hadamard_transform
  DEPRECATION: Building 'fast_hadamard_transform' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'fast_hadamard_transform'. Discussion can be found at https://github.com/pypa/pip/issues/6334
  Running command python setup.py bdist_wheel
  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
    import pynvml  # type: ignore[import]


  torch.__version__  = 2.8.0+cu128


  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated.
  !!

          ********************************************************************************
          Please consider removing the following classifiers in favor of a SPDX license expression:

          License :: OSI Approved :: BSD License

          See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
          ********************************************************************************

  !!
    self._finalize_license_expression()
  running bdist_wheel
  Guessing wheel URL:  https://github.com/Dao-AILab/fast-hadamard-transform/releases/download/v1.0.4.post1/fast_hadamard_transform-1.0.4.post1+cu122torch2.8cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
  error: <urlopen error [Errno 104] Connection reset by peer>
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/bin/python3.12 -u -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize, traceback
  
  try:
      import setuptools
  except ImportError:
      print(
          "ERROR: Can not execute `setup.py` since setuptools failed to import in "
          "the build environment with exception:",
          file=sys.stderr,
      )
      traceback.print_exc()
      sys.exit(1)
  
  __file__ = %r
  sys.argv[0] = __file__
  
  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"
  
  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/nvme1/lvhan/fast-hadamard-transform/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' bdist_wheel -d /tmp/pip-wheel-4wzg0rd8
  cwd: /nvme1/lvhan/fast-hadamard-transform/
  Building wheel for fast_hadamard_transform (setup.py) ... error
  ERROR: Failed building wheel for fast_hadamard_transform
  Running setup.py clean for fast_hadamard_transform
  Running command python setup.py clean
  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
    import pynvml  # type: ignore[import]


  torch.__version__  = 2.8.0+cu128


  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated.
  !!

          ********************************************************************************
          Please consider removing the following classifiers in favor of a SPDX license expression:

          License :: OSI Approved :: BSD License

          See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
          ********************************************************************************

  !!
    self._finalize_license_expression()
  running clean
  'build/lib.linux-x86_64-cpython-312' does not exist -- can't clean it
  'build/bdist.linux-x86_64' does not exist -- can't clean it
  'build/scripts-3.12' does not exist -- can't clean it
Failed to build fast_hadamard_transform
error: failed-wheel-build-for-install

× Failed to build installable wheels for some pyproject.toml based projects
╰─> fast_hadamard_transform

grimoire · 2025-11-02T06:03:50Z

error: <urlopen error [Errno 104] Connection reset by peer>

Try build wheel on device with network available.

lvhan028 · 2025-11-02T06:14:29Z

 Guessing wheel URL:  https://github.com/Dao-AILab/fast-hadamard-transform/releases/download/v1.0.4.post1/fast_hadamard_transform-1.0.4.post1+cu122torch2.8cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
  error: <urlopen error [Errno 104] Connection reset by peer>
  error: subprocess-exited-with-error

Errno 104 happened when it tried the guessing wheel which doesn't exist in fast-hadamard-transform release note

grimoire · 2025-11-02T07:35:27Z

https://github.com/Dao-AILab/fast-hadamard-transform/blob/f134af63deb2df17e1171a9ec1ea4a7d8604d5ca/setup.py#L40

These flags might help.

grimoire · 2025-11-04T11:29:49Z

TP8 with bf16 nccl all_reduce might have low precision.

grimoire added 11 commits October 6, 2025 15:57

WIP

e87628b

add fill kernel

af71f22

fix cudagraph

0a74b72

fix topk kernel

8d74612

fix cache

a1de79f

fix ut

f60d2f5

fix attn mla

4b813a0

fix topk

a418bc4

fix decoding cache

45b15b4

support prefill attn

49fd7a5

refactor FlashMLAImpl

9991045

grimoire changed the title ~~[WIP]Dsv32~~ Support Deepseek v32 Oct 10, 2025

grimoire marked this pull request as ready for review October 10, 2025 10:49

grimoire added 5 commits October 10, 2025 18:57

add docs

fc60e93

Merge branch 'main' into dsv32

1f6fcde

comment and check

1f510a3

fix cache size

36a922f

disable bitonic topk on triton<331

af4a15a

windreamer reviewed Oct 13, 2025

View reviewed changes

lmdeploy/pytorch/backends/cuda/nsa.py Show resolved Hide resolved

grimoire added 2 commits October 14, 2025 20:53

fix bitonic topk when triton<=3.2.0

0a8db04

fix unused args

f07b049

lvhan028 added the enhancement New feature or request label Oct 15, 2025

lvhan028 requested a review from CUHKSZzxy October 16, 2025 10:45

grimoire added 4 commits October 17, 2025 14:41

prevent overflow

17e8fe0

fix max_kv_seqlen

504940e

remove print

d44ee1c

patch deepgemm

10f4f87

windreamer requested changes Oct 30, 2025

View reviewed changes

lmdeploy/pytorch/backends/cuda/moe.py Outdated Show resolved Hide resolved

lmdeploy/pytorch/engine/cache_engine.py Show resolved Hide resolved

fix typo; add magic variable

e6a6178

windreamer approved these changes Oct 30, 2025

View reviewed changes

lvhan028 self-requested a review November 1, 2025 14:55

grimoire added 2 commits November 4, 2025 11:15

Merge branch 'main' into dsv32

c72ffe7

refactor cache engine

788ae02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Deepseek v32 #4026

Support Deepseek v32 #4026

grimoire commented Oct 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lvhan028 commented Nov 1, 2025

Uh oh!

grimoire commented Nov 2, 2025 •

edited

Loading

Uh oh!

lvhan028 commented Nov 2, 2025

Uh oh!

grimoire commented Nov 2, 2025 •

edited

Loading

Uh oh!

grimoire commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support Deepseek v32 #4026

Are you sure you want to change the base?

Support Deepseek v32 #4026

Conversation

grimoire commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lvhan028 commented Nov 1, 2025

Uh oh!

grimoire commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lvhan028 commented Nov 2, 2025

Uh oh!

grimoire commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grimoire commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

grimoire commented Oct 8, 2025 •

edited

Loading

grimoire commented Nov 2, 2025 •

edited

Loading

grimoire commented Nov 2, 2025 •

edited

Loading