Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: mobiusml/hqq
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 0.1.3.post1
Choose a base ref
...
head repository: mobiusml/hqq
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
Loading
Showing with 8,429 additions and 2,956 deletions.
  1. +160 −0 .gitignore
  2. +174 −193 Readme.md
  3. +0 −65 examples/hf/llama2_chat_hf_hub_example.py
  4. +65 −0 examples/hqq_lib_demo.py
  5. +191 −0 examples/hqq_plus.py
  6. +62 −0 examples/llama2_benchmark/quant_llama2_gptqmodel_demo.py
  7. +6 −9 examples/llama2_benchmark/quant_llama2_hqq_demo.py
  8. +0 −233 examples/lora/train_hqq_lora_example.py
  9. +33 −0 examples/models/aria_multimodal.py
  10. +106 −0 examples/models/llava-v1.6-34b_24GB.py
  11. +87 −0 examples/models/mixtral_13GB_example.py
  12. +122 −0 examples/models/qwen_vl.py
  13. +98 −0 examples/models/whisper.py
  14. +0 −62 examples/timm/vit_clip_example.py
  15. +67 −0 examples/transformers_demo.py
  16. +27 −0 examples/vllm.py
  17. +0 −25 examples/vllm/llama2_example.py
  18. +2 −2 hqq/__init__.py
  19. 0 hqq/backends/__init__.py
  20. +207 −0 hqq/backends/bitblas.py
  21. +34 −0 hqq/backends/gemlite.py
  22. +123 −0 hqq/backends/marlin.py
  23. +398 −0 hqq/backends/torchao.py
  24. +139 −130 hqq/core/bitpack.py
  25. +471 −224 hqq/core/optimize.py
  26. +537 −340 hqq/core/peft.py
  27. +1,105 −494 hqq/core/quantize.py
  28. +62 −13 hqq/core/utils.py
  29. +94 −73 hqq/engine/base.py
  30. +55 −46 hqq/engine/hf.py
  31. +58 −46 hqq/engine/timm.py
  32. +0 −116 hqq/engine/vllm.py
  33. +13 −7 hqq/kernels/hqq_aten_cuda.cpp
  34. +168 −72 hqq/kernels/hqq_aten_cuda_kernel.cu
  35. +16 −0 hqq/kernels/hqq_aten_torch.cpp
  36. +13 −9 hqq/kernels/setup_cuda.py
  37. +8 −6 hqq/kernels/setup_torch.py
  38. +614 −205 hqq/models/base.py
  39. +43 −6 hqq/models/hf/base.py
  40. +60 −49 hqq/models/hf/llama.py
  41. +60 −49 hqq/models/hf/mistral.py
  42. +72 −55 hqq/models/hf/mixtral.py
  43. +58 −54 hqq/models/hf/phi.py
  44. +49 −54 hqq/models/hf/phi_opt.py
  45. +22 −18 hqq/models/timm/base.py
  46. +65 −55 hqq/models/timm/vit_clip.py
  47. +186 −134 hqq/models/vllm/base.py
  48. +248 −103 hqq/models/vllm/llama.py
  49. 0 hqq/utils/__init__.py
  50. +311 −0 hqq/utils/aria.py
  51. +454 −0 hqq/utils/generation_hf.py
  52. +277 −0 hqq/utils/patching.py
  53. +833 −0 hqq/utils/vllm.py
  54. BIN imgs/hqq_cuda_dequant_llama270b_a100.png
  55. BIN imgs/hqq_cuda_dequant_llama27b_titanrtx.png
  56. BIN imgs/llama_int4_4090.png
  57. +69 −9 setup.py
  58. +86 −0 tests/test_bitpack.py
  59. +221 −0 tests/test_quantize.py
160 changes: 160 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
Loading