Skip to content

Conversation

@grimoire
Copy link
Collaborator

@grimoire grimoire commented Oct 8, 2025

requirements:
https://github.com/Dao-AILab/fast-hadamard-transform#
latest FlashMLA

Note: My bitonic topk kernel would failed on triton<=3.2.0. I would try to fix it. Upgrading our requirements would be a better option.

@grimoire grimoire changed the title [WIP]Dsv32 Support Deepseek v32 Oct 10, 2025
@grimoire grimoire marked this pull request as ready for review October 10, 2025 10:49
@lvhan028 lvhan028 added the enhancement New feature or request label Oct 15, 2025
@lvhan028 lvhan028 requested a review from CUHKSZzxy October 16, 2025 10:45
@lvhan028
Copy link
Collaborator

lvhan028 commented Nov 1, 2025

Following the fast-hadamard-transform installation guide but failed. May kindly share the installation method

(lmdeploy-py312) [lvhan@pj-h800-013 fast-hadamard-transform]$ pip install -v .
Using pip 25.2 from /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/pip (python 3.12)
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Processing /nvme1/lvhan/fast-hadamard-transform
  Running command python setup.py egg_info
  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
    import pynvml  # type: ignore[import]
  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated.
  !!

          ********************************************************************************
          Please consider removing the following classifiers in favor of a SPDX license expression:

          License :: OSI Approved :: BSD License

          See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
          ********************************************************************************

  !!
    self._finalize_license_expression()


  torch.__version__  = 2.8.0+cu128


  running egg_info
  creating /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info
  writing /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/dependency_links.txt
  writing requirements to /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/requires.txt
  writing top-level names to /tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/top_level.txt
  writing manifest file '/tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/SOURCES.txt'
  reading manifest file '/tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/SOURCES.txt'
  adding license file 'LICENSE'
  adding license file 'AUTHORS'
  writing manifest file '/tmp/pip-pip-egg-info-ig8g8uhr/fast_hadamard_transform.egg-info/SOURCES.txt'
  Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from fast_hadamard_transform==1.0.4.post1) (2.8.0+cu128)
Requirement already satisfied: packaging in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from fast_hadamard_transform==1.0.4.post1) (25.0)
Requirement already satisfied: ninja in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from fast_hadamard_transform==1.0.4.post1) (1.13.0)
Requirement already satisfied: filelock in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (3.20.0)
Requirement already satisfied: typing-extensions>=4.10.0 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (4.15.0)
Requirement already satisfied: setuptools in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (80.9.0)
Requirement already satisfied: sympy>=1.13.3 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (1.14.0)
Requirement already satisfied: networkx in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (3.5)
Requirement already satisfied: jinja2 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (3.1.6)
Requirement already satisfied: fsspec in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (2025.10.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.8.93 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.93)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.8.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.90)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.8.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.90)
Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (9.10.2.21)
Requirement already satisfied: nvidia-cublas-cu12==12.8.4.1 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.4.1)
Requirement already satisfied: nvidia-cufft-cu12==11.3.3.83 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (11.3.3.83)
Requirement already satisfied: nvidia-curand-cu12==10.3.9.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (10.3.9.90)
Requirement already satisfied: nvidia-cusolver-cu12==11.7.3.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (11.7.3.90)
Requirement already satisfied: nvidia-cusparse-cu12==12.5.8.93 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.5.8.93)
Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (0.7.1)
Requirement already satisfied: nvidia-nccl-cu12==2.27.3 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (2.27.3)
Requirement already satisfied: nvidia-nvtx-cu12==12.8.90 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.90)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.8.93 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (12.8.93)
Requirement already satisfied: nvidia-cufile-cu12==1.13.1.3 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (1.13.1.3)
Requirement already satisfied: triton==3.4.0 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from torch->fast_hadamard_transform==1.0.4.post1) (3.4.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from sympy>=1.13.3->torch->fast_hadamard_transform==1.0.4.post1) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages (from jinja2->torch->fast_hadamard_transform==1.0.4.post1) (3.0.3)
Building wheels for collected packages: fast_hadamard_transform
  DEPRECATION: Building 'fast_hadamard_transform' using the legacy setup.py bdist_wheel mechanism, which will be removed in a future version. pip 25.3 will enforce this behaviour change. A possible replacement is to use the standardized build interface by setting the `--use-pep517` option, (possibly combined with `--no-build-isolation`), or adding a `pyproject.toml` file to the source tree of 'fast_hadamard_transform'. Discussion can be found at https://github.com/pypa/pip/issues/6334
  Running command python setup.py bdist_wheel
  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
    import pynvml  # type: ignore[import]


  torch.__version__  = 2.8.0+cu128


  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated.
  !!

          ********************************************************************************
          Please consider removing the following classifiers in favor of a SPDX license expression:

          License :: OSI Approved :: BSD License

          See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
          ********************************************************************************

  !!
    self._finalize_license_expression()
  running bdist_wheel
  Guessing wheel URL:  https://github.com/Dao-AILab/fast-hadamard-transform/releases/download/v1.0.4.post1/fast_hadamard_transform-1.0.4.post1+cu122torch2.8cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
  error: <urlopen error [Errno 104] Connection reset by peer>
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/bin/python3.12 -u -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize, traceback
  
  try:
      import setuptools
  except ImportError:
      print(
          "ERROR: Can not execute `setup.py` since setuptools failed to import in "
          "the build environment with exception:",
          file=sys.stderr,
      )
      traceback.print_exc()
      sys.exit(1)
  
  __file__ = %r
  sys.argv[0] = __file__
  
  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"
  
  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/nvme1/lvhan/fast-hadamard-transform/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' bdist_wheel -d /tmp/pip-wheel-4wzg0rd8
  cwd: /nvme1/lvhan/fast-hadamard-transform/
  Building wheel for fast_hadamard_transform (setup.py) ... error
  ERROR: Failed building wheel for fast_hadamard_transform
  Running setup.py clean for fast_hadamard_transform
  Running command python setup.py clean
  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
    import pynvml  # type: ignore[import]


  torch.__version__  = 2.8.0+cu128


  /nvme1/lvhan/miniconda3/envs/lmdeploy-py312/lib/python3.12/site-packages/setuptools/dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated.
  !!

          ********************************************************************************
          Please consider removing the following classifiers in favor of a SPDX license expression:

          License :: OSI Approved :: BSD License

          See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
          ********************************************************************************

  !!
    self._finalize_license_expression()
  running clean
  'build/lib.linux-x86_64-cpython-312' does not exist -- can't clean it
  'build/bdist.linux-x86_64' does not exist -- can't clean it
  'build/scripts-3.12' does not exist -- can't clean it
Failed to build fast_hadamard_transform
error: failed-wheel-build-for-install

× Failed to build installable wheels for some pyproject.toml based projects
╰─> fast_hadamard_transform

@lvhan028 lvhan028 self-requested a review November 1, 2025 14:55
@grimoire
Copy link
Collaborator Author

grimoire commented Nov 2, 2025

error: <urlopen error [Errno 104] Connection reset by peer>

Try build wheel on device with network available.

@lvhan028
Copy link
Collaborator

lvhan028 commented Nov 2, 2025

 Guessing wheel URL:  https://github.com/Dao-AILab/fast-hadamard-transform/releases/download/v1.0.4.post1/fast_hadamard_transform-1.0.4.post1+cu122torch2.8cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
  error: <urlopen error [Errno 104] Connection reset by peer>
  error: subprocess-exited-with-error

Errno 104 happened when it tried the guessing wheel which doesn't exist in fast-hadamard-transform release note

@grimoire
Copy link
Collaborator Author

grimoire commented Nov 2, 2025

@grimoire
Copy link
Collaborator Author

grimoire commented Nov 4, 2025

TP8 with bf16 nccl all_reduce might have low precision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants