transformers #713

jiqing-feng · 2024-11-29T07:32:53Z

This PR enables transformers example.

For optimum lib, see: huggingface/optimum#2064

For transformers lib, see: huggingface/transformers#35012

Apply the 2 changes and this PR can run the example transformers_usage.py

Qubitium · 2024-11-29T14:47:12Z

@jiqing-feng Ran out of time today. We will check the two prs and test tomorrow and see if there is anything we should change. The only thing that I see that I may want to change is to make sure transformers/optimum not call any internal methods such as select_quant_linear directly. Want to expose a more stable api, maybe even just a wrapper to select_quant_linear like hf_select_quant_linear so that we can play and fudge with internal api all we want and not breakhf_select_quant_linear.

Signed-off-by: jiqing-feng <[email protected]>

jiqing-feng · 2024-12-02T02:55:07Z

@jiqing-feng Ran out of time today. We will check the two prs and test tomorrow and see if there is anything we should change. The only thing that I see that I may want to change is to make sure transformers/optimum not call any internal methods such as select_quant_linear directly. Want to expose a more stable api, maybe even just a wrapper to select_quant_linear like hf_select_quant_linear so that we can play and fudge with internal api all we want and not breakhf_select_quant_linear.

Agree, I have integrated hf_select_quant_linear in the same place.

jiqing-feng · 2024-12-02T06:38:15Z

Hi @Qubitium . The optimum and transformers PR have been verified on CPU, do you mind verifying it on cuda? I always met building issues when I built gptqmodel from source.

Qubitium · 2024-12-02T07:51:55Z

@jiqing-feng Ok. Can you show me your cuda compile errors? I want to check if related to our compiler flags and/or env.

jiqing-feng · 2024-12-02T07:55:34Z

@jiqing-feng Ok. Can you show me your cuda compile errors? I want to check if related to our compiler flags and/or env.

  /usr/local/cuda/include/cuda_bf16.hpp(3736): note #3326-D: function "atomicAdd(__nv_bfloat162 *, __nv_bfloat162)" does not match because argument #1 does not match parameter
    static __attribute__((device)) __inline__ __nv_bfloat162 atomicAdd(__nv_bfloat162 *const address, const __nv_bfloat162 val)
                                                             ^
  /usr/local/cuda/include/cuda_fp16.hpp(3390): note #3326-D: function "atomicAdd(__half2 *, __half2)" does not match because argument #1 does not match parameter
    static __attribute__((device)) __inline__ __half2 atomicAdd(__half2 *const address, const __half2 val) {
                                                      ^
  /usr/local/cuda/include/sm_20_atomic_functions.hpp(82): note #3326-D: function "atomicAdd(float *, float)" does not match because argument #1 does not match parameter
    static __inline__ __attribute__((device)) float atomicAdd(float *address, float val)
                                                    ^
  /usr/local/cuda/include/device_atomic_functions.hpp(224): note #3326-D: function "atomicAdd(unsigned long long *, unsigned long long)" does not match because argument #1 does not match parameter
    static __inline__ __attribute__((device)) unsigned long long int atomicAdd(unsigned long long int *address, unsigned long long int val)
                                                                     ^
  /usr/local/cuda/include/device_atomic_functions.hpp(110): note #3326-D: function "atomicAdd(unsigned int *, unsigned int)" does not match because argument #1 does not match parameter
    static __inline__ __attribute__((device)) unsigned int atomicAdd(unsigned int *address, unsigned int val)
                                                           ^
  /usr/local/cuda/include/device_atomic_functions.hpp(105): note #3326-D: function "atomicAdd(int *, int)" does not match because argument #1 does not match parameter
    static __inline__ __attribute__((device)) int atomicAdd(int *address, int val)
                                                  ^
            detected during instantiation of "void VecQuant8MatMulKernel(const scalar_t *, const int *, scalar_t *, const scalar_t *, const int *, const int *, int, int, int, int, int) [with scalar_t=double]" a
t line 489

  4 errors detected in the compilation of "/workspace/jiqing/GPTQModel/gptqmodel_ext/cuda_64/gptqmodel_cuda_kernel_64.cu".
  ninja: build stopped: subcommand failed.
  Traceback (most recent call last):
    File "/workspace/jiqing/GPTQModel/setup.py", line 219, in run
      urllib.request.urlretrieve(wheel_url, wheel_filename)
    File "/usr/lib/python3.10/urllib/request.py", line 241, in urlretrieve
      with contextlib.closing(urlopen(url, data)) as fp:
    File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
      return opener.open(url, data, timeout)
    File "/usr/lib/python3.10/urllib/request.py", line 525, in open
      response = meth(req, response)
    File "/usr/lib/python3.10/urllib/request.py", line 634, in http_response
      response = self.parent.error(
    File "/usr/lib/python3.10/urllib/request.py", line 563, in error
      return self._call_chain(*args)
    File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
      result = func(*args)
    File "/usr/lib/python3.10/urllib/request.py", line 643, in http_error_default
      raise HTTPError(req.full_url, code, msg, hdrs, fp)
  urllib.error.HTTPError: HTTP Error 404: Not Found

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2104, in _run_ninja_build
      subprocess.run(
    File "/usr/lib/python3.10/subprocess.py", line 526, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/workspace/jiqing/GPTQModel/setup.py", line 237, in <module>
      setup(
    File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 117, in setup
      return distutils.core.setup(**attrs)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 183, in setup
      return run_commands(dist)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 199, in run_commands
      dist.run_commands()
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 954, in run_commands
      self.run_command(cmd)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 995, in run_command
      super().run_command(command)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 973, in run_command
      cmd_obj.run()
    File "/workspace/jiqing/GPTQModel/setup.py", line 234, in run
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build.py", line 135, in run
      self.run_command(cmd_name)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 995, in run_command
      super().run_command(command)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 973, in run_command
      cmd_obj.run()
    File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 99, in run
      _build_ext.run(self)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
      self.build_extensions()
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 868, in build_extensions
      build_ext.build_extensions(self)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 476, in build_extensions
      self._build_extensions_serial()
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 502, in _build_extensions_serial
      self.build_extension(ext)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 264, in build_extension
      _build_ext.build_extension(self, ext)
    File "/usr/local/lib/python3.10/dist-packages/Cython/Distutils/build_ext.py", line 135, in build_extension
      super(build_ext, self).build_extension(ext)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 557, in build_extension
      objects = self.compiler.compile(
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 681, in unix_wrap_ninja_compile
      _write_ninja_file_and_compile_objects(
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1784, in _write_ninja_file_and_compile_objects
      _run_ninja_build(
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2120, in _run_ninja_build
      raise RuntimeError(message) from e
  RuntimeError: Error compiling objects for extension
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /usr/bin/python -u -c '
  exec(compile('"'"''"'"''"'"'
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:                                                                                                             [210/1924]
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize

  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)

  __file__ = %r
  sys.argv[0] = __file__

  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"

  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/workspace/jiqing/GPTQModel/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' bdist_wheel -d /tmp/pip-wheel-tapcuacz
  cwd: /workspace/jiqing/GPTQModel/
  Building wheel for gptqmodel (setup.py) ... error
  ERROR: Failed building wheel for gptqmodel

Qubitium · 2024-12-02T08:32:42Z

@CSY-ModelCloud I see a 404 urllib error. Caused by our whl download code?

Qubitium · 2024-12-02T09:09:25Z

@jiqing-feng Please change the tranaformer and optimum Pr into draft mode until it passes tests. Right now it is not passing and some changes are required.

jiqing-feng · 2024-12-02T09:11:09Z

@jiqing-feng Please change the tranaformer and optimum Pr into draft mode until it passes tests. Right now it is not passing and some changes are required.

Got it.

Qubitium · 2024-12-02T09:17:00Z

@jiqing-feng biggest issue right now that is gptqmodel's internal format is gptq_v2 so directly using the quant-linear doesnt work for old quanted models such as thebloke or other gptq quantizers that use gptq v1.

The fix is that gptqmodel needs to receive the full GPTQConfig to post_init so we can auto-check and upconvert v1 to v2.

Qubitium · 2024-12-02T09:39:38Z

@jiqing-feng biggest issue right now that is gptqmodel's internal format is gptq_v2 so directly using the quant-linear doesnt work for old quanted models such as thebloke or other gptq quantizers that use gptq v1.

The fix is that gptqmodel needs to receive the full GPTQConfig to post_init so we can auto-check and upconvert v1 to v2.

We are currently discussing how to best go about this with minimum changes.

Signed-off-by: jiqing-feng <[email protected]>

Qubitium · 2024-12-02T14:32:21Z

Unit tests #724

quantization and inference for GPTQ and GPTQ_v2 have been fixed. Created two PRs requesting to merge into jiqing-feng's branch:

jiqing-feng/optimum, https://github.com/jiqing-feng/optimum/pull/1/files
jiqing-feng/transformers, jiqing-feng/transformers#1

jiqing-feng added 2 commits November 29, 2024 15:24

transformers

4d430fd

Signed-off-by: jiqing-feng <[email protected]>

Merge branch 'main' into ipex

22a3464

jiqing-feng marked this pull request as ready for review December 2, 2024 02:54

jiqing-feng added 2 commits December 2, 2024 10:32

add hf_select_quant_layer

9ecea45

Signed-off-by: jiqing-feng <[email protected]>

add transformers inference example

6b4532e

Signed-off-by: jiqing-feng <[email protected]>

Qubitium mentioned this pull request Dec 3, 2024

[INTEGRATION] Expose stable kernel/packing/repacking apis #726

Open

jiqing-feng closed this Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transformers #713

transformers #713

jiqing-feng commented Nov 29, 2024 •

edited

Loading

Qubitium commented Nov 29, 2024

jiqing-feng commented Dec 2, 2024

jiqing-feng commented Dec 2, 2024

Qubitium commented Dec 2, 2024

jiqing-feng commented Dec 2, 2024

Qubitium commented Dec 2, 2024

Qubitium commented Dec 2, 2024

jiqing-feng commented Dec 2, 2024

Qubitium commented Dec 2, 2024 •

edited

Loading

Qubitium commented Dec 2, 2024

Qubitium commented Dec 2, 2024

transformers #713

transformers #713

Conversation

jiqing-feng commented Nov 29, 2024 • edited Loading

Qubitium commented Nov 29, 2024

jiqing-feng commented Dec 2, 2024

jiqing-feng commented Dec 2, 2024

Qubitium commented Dec 2, 2024

jiqing-feng commented Dec 2, 2024

Qubitium commented Dec 2, 2024

Qubitium commented Dec 2, 2024

jiqing-feng commented Dec 2, 2024

Qubitium commented Dec 2, 2024 • edited Loading

Qubitium commented Dec 2, 2024

Qubitium commented Dec 2, 2024

jiqing-feng commented Nov 29, 2024 •

edited

Loading

Qubitium commented Dec 2, 2024 •

edited

Loading