Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transformers #713

Closed
wants to merge 4 commits into from
Closed

transformers #713

wants to merge 4 commits into from

Conversation

jiqing-feng
Copy link
Contributor

@jiqing-feng jiqing-feng commented Nov 29, 2024

This PR enables transformers example.

For optimum lib, see: huggingface/optimum#2064

For transformers lib, see: huggingface/transformers#35012

Apply the 2 changes and this PR can run the example transformers_usage.py

@Qubitium
Copy link
Collaborator

@jiqing-feng Ran out of time today. We will check the two prs and test tomorrow and see if there is anything we should change. The only thing that I see that I may want to change is to make sure transformers/optimum not call any internal methods such as select_quant_linear directly. Want to expose a more stable api, maybe even just a wrapper to select_quant_linear like hf_select_quant_linear so that we can play and fudge with internal api all we want and not breakhf_select_quant_linear.

@jiqing-feng jiqing-feng marked this pull request as ready for review December 2, 2024 02:54
@jiqing-feng
Copy link
Contributor Author

@jiqing-feng Ran out of time today. We will check the two prs and test tomorrow and see if there is anything we should change. The only thing that I see that I may want to change is to make sure transformers/optimum not call any internal methods such as select_quant_linear directly. Want to expose a more stable api, maybe even just a wrapper to select_quant_linear like hf_select_quant_linear so that we can play and fudge with internal api all we want and not breakhf_select_quant_linear.

Agree, I have integrated hf_select_quant_linear in the same place.

@jiqing-feng
Copy link
Contributor Author

Hi @Qubitium . The optimum and transformers PR have been verified on CPU, do you mind verifying it on cuda? I always met building issues when I built gptqmodel from source.

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 2, 2024

@jiqing-feng Ok. Can you show me your cuda compile errors? I want to check if related to our compiler flags and/or env.

@jiqing-feng
Copy link
Contributor Author

@jiqing-feng Ok. Can you show me your cuda compile errors? I want to check if related to our compiler flags and/or env.

  /usr/local/cuda/include/cuda_bf16.hpp(3736): note #3326-D: function "atomicAdd(__nv_bfloat162 *, __nv_bfloat162)" does not match because argument #1 does not match parameter
    static __attribute__((device)) __inline__ __nv_bfloat162 atomicAdd(__nv_bfloat162 *const address, const __nv_bfloat162 val)
                                                             ^
  /usr/local/cuda/include/cuda_fp16.hpp(3390): note #3326-D: function "atomicAdd(__half2 *, __half2)" does not match because argument #1 does not match parameter
    static __attribute__((device)) __inline__ __half2 atomicAdd(__half2 *const address, const __half2 val) {
                                                      ^
  /usr/local/cuda/include/sm_20_atomic_functions.hpp(82): note #3326-D: function "atomicAdd(float *, float)" does not match because argument #1 does not match parameter
    static __inline__ __attribute__((device)) float atomicAdd(float *address, float val)
                                                    ^
  /usr/local/cuda/include/device_atomic_functions.hpp(224): note #3326-D: function "atomicAdd(unsigned long long *, unsigned long long)" does not match because argument #1 does not match parameter
    static __inline__ __attribute__((device)) unsigned long long int atomicAdd(unsigned long long int *address, unsigned long long int val)
                                                                     ^
  /usr/local/cuda/include/device_atomic_functions.hpp(110): note #3326-D: function "atomicAdd(unsigned int *, unsigned int)" does not match because argument #1 does not match parameter
    static __inline__ __attribute__((device)) unsigned int atomicAdd(unsigned int *address, unsigned int val)
                                                           ^
  /usr/local/cuda/include/device_atomic_functions.hpp(105): note #3326-D: function "atomicAdd(int *, int)" does not match because argument #1 does not match parameter
    static __inline__ __attribute__((device)) int atomicAdd(int *address, int val)
                                                  ^
            detected during instantiation of "void VecQuant8MatMulKernel(const scalar_t *, const int *, scalar_t *, const scalar_t *, const int *, const int *, int, int, int, int, int) [with scalar_t=double]" a
t line 489

  4 errors detected in the compilation of "/workspace/jiqing/GPTQModel/gptqmodel_ext/cuda_64/gptqmodel_cuda_kernel_64.cu".
  ninja: build stopped: subcommand failed.
  Traceback (most recent call last):
    File "/workspace/jiqing/GPTQModel/setup.py", line 219, in run
      urllib.request.urlretrieve(wheel_url, wheel_filename)
    File "/usr/lib/python3.10/urllib/request.py", line 241, in urlretrieve
      with contextlib.closing(urlopen(url, data)) as fp:
    File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
      return opener.open(url, data, timeout)
    File "/usr/lib/python3.10/urllib/request.py", line 525, in open
      response = meth(req, response)
    File "/usr/lib/python3.10/urllib/request.py", line 634, in http_response
      response = self.parent.error(
    File "/usr/lib/python3.10/urllib/request.py", line 563, in error
      return self._call_chain(*args)
    File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
      result = func(*args)
    File "/usr/lib/python3.10/urllib/request.py", line 643, in http_error_default
      raise HTTPError(req.full_url, code, msg, hdrs, fp)
  urllib.error.HTTPError: HTTP Error 404: Not Found

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2104, in _run_ninja_build
      subprocess.run(
    File "/usr/lib/python3.10/subprocess.py", line 526, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/workspace/jiqing/GPTQModel/setup.py", line 237, in <module>
      setup(
    File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 117, in setup
      return distutils.core.setup(**attrs)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 183, in setup
      return run_commands(dist)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 199, in run_commands
      dist.run_commands()
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 954, in run_commands
      self.run_command(cmd)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 995, in run_command
      super().run_command(command)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 973, in run_command
      cmd_obj.run()
    File "/workspace/jiqing/GPTQModel/setup.py", line 234, in run
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build.py", line 135, in run
      self.run_command(cmd_name)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 995, in run_command
      super().run_command(command)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 973, in run_command
      cmd_obj.run()
    File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 99, in run
      _build_ext.run(self)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
      self.build_extensions()
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 868, in build_extensions
      build_ext.build_extensions(self)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 476, in build_extensions
      self._build_extensions_serial()
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 502, in _build_extensions_serial
      self.build_extension(ext)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 264, in build_extension
      _build_ext.build_extension(self, ext)
    File "/usr/local/lib/python3.10/dist-packages/Cython/Distutils/build_ext.py", line 135, in build_extension
      super(build_ext, self).build_extension(ext)
    File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 557, in build_extension
      objects = self.compiler.compile(
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 681, in unix_wrap_ninja_compile
      _write_ninja_file_and_compile_objects(
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1784, in _write_ninja_file_and_compile_objects
      _run_ninja_build(
    File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2120, in _run_ninja_build
      raise RuntimeError(message) from e
  RuntimeError: Error compiling objects for extension
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /usr/bin/python -u -c '
  exec(compile('"'"''"'"''"'"'
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:                                                                                                             [210/1924]
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize

  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)

  __file__ = %r
  sys.argv[0] = __file__

  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"

  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/workspace/jiqing/GPTQModel/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' bdist_wheel -d /tmp/pip-wheel-tapcuacz
  cwd: /workspace/jiqing/GPTQModel/
  Building wheel for gptqmodel (setup.py) ... error
  ERROR: Failed building wheel for gptqmodel

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 2, 2024

@CSY-ModelCloud I see a 404 urllib error. Caused by our whl download code?

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 2, 2024

@jiqing-feng Please change the tranaformer and optimum Pr into draft mode until it passes tests. Right now it is not passing and some changes are required.

@jiqing-feng
Copy link
Contributor Author

@jiqing-feng Please change the tranaformer and optimum Pr into draft mode until it passes tests. Right now it is not passing and some changes are required.

Got it.

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 2, 2024

@jiqing-feng biggest issue right now that is gptqmodel's internal format is gptq_v2 so directly using the quant-linear doesnt work for old quanted models such as thebloke or other gptq quantizers that use gptq v1.

The fix is that gptqmodel needs to receive the full GPTQConfig to post_init so we can auto-check and upconvert v1 to v2.

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 2, 2024

@jiqing-feng biggest issue right now that is gptqmodel's internal format is gptq_v2 so directly using the quant-linear doesnt work for old quanted models such as thebloke or other gptq quantizers that use gptq v1.

The fix is that gptqmodel needs to receive the full GPTQConfig to post_init so we can auto-check and upconvert v1 to v2.

We are currently discussing how to best go about this with minimum changes.

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 2, 2024

Unit tests #724

quantization and inference for GPTQ and GPTQ_v2 have been fixed. Created two PRs requesting to merge into jiqing-feng's branch:

jiqing-feng/optimum, https://github.com/jiqing-feng/optimum/pull/1/files
jiqing-feng/transformers, jiqing-feng/transformers#1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants