Skip to content

Commit

Permalink
Merge public Intel Neural Compressor to fork (#93)
Browse files Browse the repository at this point in the history
* modify 3.x ipex example structure (#1858)

* modify 3.x ipex example structure

Signed-off-by: Cheng, Zixuan <[email protected]>

* add json path

Signed-off-by: Cheng, Zixuan <[email protected]>

* fix for sq

Signed-off-by: Cheng, Zixuan <[email protected]>

* minor fix

Signed-off-by: Cheng, Zixuan <[email protected]>

* Update run_clm_no_trainer.py

* Update run_clm_no_trainer.py

* Update run_clm_no_trainer.py

* minor fix

Signed-off-by: Cheng, Zixuan <[email protected]>

* remove old files

Signed-off-by: Cheng, Zixuan <[email protected]>

* fix act_algo

Signed-off-by: Cheng, Zixuan <[email protected]>

---------

Signed-off-by: Cheng, Zixuan <[email protected]>
Co-authored-by: xinhe <[email protected]>

* Improve UT Branch Coverage for TF 3x (#1867)

Signed-off-by: zehao-intel <[email protected]>

* [3x] add recommendation examples (#1844)

Signed-off-by: xin3he <[email protected]>

* Add PT2E cv&llm example (#1853)

Signed-off-by: Kaihui-intel <[email protected]>

* Update SQ/WOQ status (#1869)

Signed-off-by: Sun, Xuehao <[email protected]>
Co-authored-by: chen, suyue <[email protected]>

* Modify WOQ examples structure (#1866)

Signed-off-by: Kaihui-intel <[email protected]>
Signed-off-by: chensuyue <[email protected]>

* update v2.6 release readme (#1871)

Signed-off-by: chensuyue <[email protected]>

* Limit numpy versions (#1874)

Signed-off-by: Sun, Xuehao <[email protected]>

* fix layer match (#1873)

Signed-off-by: Kaihui-intel <[email protected]>
Co-authored-by: Sun, Xuehao <[email protected]>

* Enhance autotune to return the best `q_model` directly (#1875)

Signed-off-by: yiliu30 <[email protected]>

* Add op statistics dump for woq (#1876)

Signed-off-by: Kaihui-intel <[email protected]>

* rm cov (#1878)

Signed-off-by: yiliu30 <[email protected]>

* Add `set_local` support for static quant with pt2e (#1870)

Signed-off-by: yiliu30 <[email protected]>

* Update the Gaudi container example in the README (#1885)

* support quant_lm_head arg in all WOQ configs (#1881)

Signed-off-by: xin3he <[email protected]>

* Fix sql injection for Neural Solution gRPC (#1879)

Signed-off-by: Kaihui-intel <[email protected]>

* Remove Gelu Fusion for TF Newapi (#1886)

Signed-off-by: zehao-intel <[email protected]>

* Refine HQQ UTs (#1888)

Signed-off-by: yiliu30 <[email protected]>

* tmp fix nas deps issue (#1896)

Signed-off-by: chensuyue <[email protected]>

* support auto_host2device on RTN and GPTQ(#1894)

Signed-off-by: He, Xin3 <[email protected]>

* remove import pdb (#1897)

Signed-off-by: changwangss <[email protected]>

* Port auto-detect absorb layers for TEQ (#1895)

Signed-off-by: yiliu30 <[email protected]>

* Remove 1x API (#1865)

Signed-off-by: yiliu30 <[email protected]>
Co-authored-by: chen, suyue <[email protected]>

* remove neural insight CI (#1903)

Signed-off-by: Sun, Xuehao <[email protected]>

* fix bf16 symbolic_trace bug (#1892)

Description: fix bf16 symbolic_trace bug,

- cause abnormal recursive calling.
- missing necessary attributes
- By moving BF16 fallback ahead of quantization and removing bf16_symbolic_trace, we fix it.

---------

Signed-off-by: xin3he <[email protected]>
Co-authored-by: Sun, Xuehao <[email protected]>

* update fp4_e2m1 mapping list (#1906)

* update fp4_e2m1 mapping list

* Update utility.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add docstring for `common` module (#1905)

Signed-off-by: yiliu30 <[email protected]>

* support habana fp8 UT test in CI (#1909)

Signed-off-by: chensuyue <[email protected]>

* bump version into 3.0 (#1908)

Signed-off-by: chensuyue <[email protected]>

* implement `incbench` command for ease-of-use benchmark (#1884)

# Description
 implement incbench command as entrypoint for ease-of-use benchmark
 automatically check numa/socket info and dump it with table for ease-of-understand
 supports both Linux and Windows platform
 add benchmark documents
 dump benchmark summary
 add benchmark UTs

# General Use Cases
incbench main.py: run 1 instance on NUMA:0.
incbench --num_i 2 main.py: run 2 instances on NUMA:0.
incbench --num_c 2 main.py: run multi-instances with 2 cores per instance on NUMA:0.
incbench -C 24-47 main.py: run 1 instance on COREs:24-47.
incbench -C 24-47 --num_c 4 main.py: run multi-instances with 4 COREs per instance on COREs:24-47.

---------

Signed-off-by: xin3he <[email protected]>
Co-authored-by: chen, suyue <[email protected]>

* Get default config based on the auto-detect CPU type (#1904)

Signed-off-by: yiliu30 <[email protected]>

* Add export support for TEQ (#1910)

Signed-off-by: yiliu30 <[email protected]>

* update Gaudi CI baseline artifacts name (#1912)

Signed-off-by: chensuyue <[email protected]>

* Remove deprecated modules (#1872)

Signed-off-by: chensuyue <[email protected]>

* fix CI docker container clean up issue (#1917)

Signed-off-by: chensuyue <[email protected]>

* remove 1x docs (#1900)

Signed-off-by: yiliu30 <[email protected]>

* Add `save`/`load` support for HQQ (#1913)

Signed-off-by: yiliu30 <[email protected]>
Co-authored-by: chen, suyue <[email protected]>

* Support PT2E save and load (#1918)

Signed-off-by: Kaihui-intel <[email protected]>

* implement TorchBaseConfig (#1911)

Signed-off-by: xin3he <[email protected]>

* update documentation for 3x API (#1923)

Signed-off-by: chensuyue <[email protected]>
Signed-off-by: xin3he <[email protected]>
Signed-off-by: yiliu30 <[email protected]>

* fix typo in architecture diagram (#1924)

Signed-off-by: Huang, Tai <[email protected]>

* Support woq Autotune (#1921)

Signed-off-by: Kaihui-intel <[email protected]>

* Support absorb dict for awq (#1920)

Signed-off-by: Kaihui-intel <[email protected]>

* Support LayerWise for RTN/GPTQ (#1883)

Signed-off-by: Kaihui-intel <[email protected]>
Co-authored-by: chensuyue <[email protected]>

* update itrex ut test (#1929)

Signed-off-by: chensuyue <[email protected]>

* add docstring for torch.quantization and torch.utils (#1928)

Signed-off-by: xin3he <[email protected]>

* Integrate AutoRound v0.3 (#1925)

Signed-off-by: Kaihui-intel <[email protected]>

* Integrate AutoRound v0.3 to 2x (#1926)

Signed-off-by: Kaihui-intel <[email protected]>

* Enhance load_empty_model import (#1930)

Signed-off-by: Kaihui-intel <[email protected]>

* Add doc for client usage (#1914)

Signed-off-by: yiliu30 <[email protected]>

* remove peft version limit (#1933)

Signed-off-by: chensuyue <[email protected]>

* Support xpu for ipex static quant (#1916)

Signed-off-by: violetch24 <[email protected]>

* Support calib_func on TF 3x API (#1934)

Signed-off-by: zehao-intel <[email protected]>

* 3.X API installation update (#1935)

Signed-off-by: chensuyue <[email protected]>

* Fix unused pkgs  import (#1931)


Signed-off-by: Kaihui-intel <[email protected]>

* Add docstring for PT2E and HQQ (#1937)

Signed-off-by: yiliu30 <[email protected]>

* add docstring for static quant and smooth quant (#1936)

* add docstring for static quant and smooth quant

Signed-off-by: violetch24 <[email protected]>

* format fix

Signed-off-by: violetch24 <[email protected]>

* update scan path

Signed-off-by: violetch24 <[email protected]>

* Update utility.py

---------

Signed-off-by: violetch24 <[email protected]>
Co-authored-by: violetch24 <[email protected]>

* Update Example for Pytorch 3x Mixed Precision (#1882)

Signed-off-by: zehao-intel <[email protected]>

* add read permission token (#1942)

Signed-off-by: Huang, Tai <[email protected]>

* Add docstring for WOQ&LayerWise (#1938)


Signed-off-by: Kaihui-intel <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: xinhe <[email protected]>

* add docstring for mx quant (#1932)

Signed-off-by: Mengni Wang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: xinhe <[email protected]>

* Update for API 3.0 online doc (#1940)

Co-authored-by: ZhangJianyu <[email protected]>

* Refine Pytorch 3x Mixed Precision Example (#1946)

Signed-off-by: zehao-intel <[email protected]>

* Update AutoRound commit version (#1941)

Signed-off-by: Kaihui-intel <[email protected]>

* Update publish.yml (#1949)

* Update publish.yml

* Update publish.yml

* Update publish.yml (#1950)

* Update doc for client-usage and LWQ (#1947)

Signed-off-by: yiliu30 <[email protected]>

* Add Docstring for TF 3x API and Torch 3x Mixed Precision (#1944)

Signed-off-by: zehao-intel <[email protected]>

* Update Examples for TF 3x API (#1901)

Signed-off-by: zehao-intel <[email protected]>

* Complement UT of calibration function for TF 3x API (#1945)

Signed-off-by: zehao-intel <[email protected]>

* Enable yolov5 Example for TF 3x API  (#1943)

Signed-off-by: zehao-intel <[email protected]>

* add ipex xpu example to 3x API (#1948)

Signed-off-by: violetch24 <[email protected]>

* update 3x torch installation (#1957)

Signed-off-by: chensuyue <[email protected]>

* Add save/load for pt2e example (#1927)

Signed-off-by: Kaihui-intel <[email protected]>

* Fix itrex qbits nf4/int8 training core dumped issue (#1954)

Signed-off-by: Kaihui-intel <[email protected]>
Signed-off-by: chensuyue <[email protected]>

* new previous results could not find all raise issues in CI model test (#1958)

Signed-off-by: chensuyue <[email protected]>

* Set low_gpu_mem_usage=False for AutoRound

Signed-off-by: Kaihui-intel <[email protected]>

* Bump tensorflow version (#1961)

Signed-off-by: dependabot[bot] <[email protected]>

* fix docs link (#1959)

Signed-off-by: chensuyue <[email protected]>

* fix welcome.html link issue (#1962)

Co-authored-by: ZhangJianyu <[email protected]>

* replenish docstring (#1955)

* replenish docstring

Signed-off-by: xin3he <[email protected]>

* update  Quantizer API docstring

Signed-off-by: xin3he <[email protected]>

* Add docstring for auto accelerator (#1956)

Signed-off-by: yiliu30 <[email protected]>

* temporary remove torch/quantization and add it back after fp8 code is updated.

* Update config.py

---------

Signed-off-by: xin3he <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Co-authored-by: Yi Liu <[email protected]>

* add SDXL model example to INC 3.x (#1887)

* add SDXL model example to INC 3.x

Signed-off-by: Cheng, Zixuan <[email protected]>

* add evaluation script

Signed-off-by: violetch24 <[email protected]>

* add test script

Signed-off-by: violetch24 <[email protected]>

* minor fix

Signed-off-by: violetch24 <[email protected]>

* Update run_quant.sh

* add iter limit

Signed-off-by: violetch24 <[email protected]>

* modify test script

Signed-off-by: violetch24 <[email protected]>

* update json

Signed-off-by: chensuyue <[email protected]>

* add requirements

Signed-off-by: violetch24 <[email protected]>

* Update run_benchmark.sh

* Update sdxl_smooth_quant.py

* minor fix

Signed-off-by: violetch24 <[email protected]>

---------

Signed-off-by: Cheng, Zixuan <[email protected]>
Signed-off-by: violetch24 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: violetch24 <[email protected]>
Co-authored-by: chensuyue <[email protected]>

* example update for 3.x ipex sq (#1902)

Signed-off-by: violetch24 <[email protected]>

* Fix `opt_125m_woq_gptq_int4_dq_ggml` issue (#1965)

Signed-off-by: Kaihui-intel <[email protected]>

* remove unnecessary CI (#1966)

Signed-off-by: Sun, Xuehao <[email protected]>

* Add version mapping between INC and Gaudi SW Stack (#1967)

Signed-off-by: Huang, Tai <[email protected]>

* Add 3.x readme (#1971)

Signed-off-by: Sun, Xuehao <[email protected]>

* Fix broken link in docs (#1969)

Signed-off-by: Huang, Tai <[email protected]>

* Cherry pick v1.17.0 (#1964)

* [SW-184941] INC CI, CD and Promotion

Change-Id: I60c420f9776e1bdab7bb9e02e5bcbdb6891bfe52

* [SW-183320]updated setup.py

Change-Id: I592af89486cb1d9e0b5197521c428920197a9103

* [SW-177474] add HQT FP8 porting code

Change-Id: I4676f13a5ed43c444f2ec68675cc41335e7234dd
Signed-off-by: Zhou Yuwen <[email protected]>

* [SW-189361] Fix white list extend

Change-Id: Ic2021c248798fce37710d28014a6d59259c868a3

* [SW-191317] Raise exception according to hqt config object

Change-Id: I06ba8fa912c811c88912987c11e5c12ef328348a

* [SW-184714] Port HQT code into INC

HQT lib content was copied as is under fp8_quant

Tests were copied to 3.x torch location

Change-Id: Iec6e1fa7ac4bf1df1c95b429524c40e32bc13ac9

* [SW-184714] Add internal folder to fp8 quant

This is a folder used for experiments,
not to be used by users

Change-Id: I9e221ae582794e304e95392c0f37638f7bce69bc

* [SW-177468] Removed unused code + cleanup

Change-Id: I4d27c067e87c1a30eb1da9df16a16c46d092c638

* Fix errors in regression_detection

Change-Id: Iee5318bd5593ba349812516eb5641958ece3c438

* [SW-187731] Save orig module as member of patched module

This allows direct usage of the original module methods,
which solves torch compile issue

Change-Id: I464d8bd1bacdfc3cd1f128a67114e1e43f092632

* [SW-190899] Install packages according to configuration

Change-Id: I570b490658f5d2c5399ba1db93f8f52f56449525

* [SW-184689] use finalize_calibration intrenaly for one step flow

Change-Id: Ie0b8b426c951cf57ed7e6e678c86813fb2d05c89

* [SW-191945] align requirement_pt.txt in gerrit INC with Github INC

Change-Id: If5c0dbf21bf989af37a8e29246e4f8760cd215ef
Signed-off-by: xinhe3 <[email protected]>

* [SW-192358] Remove HQT reference in INC

Change-Id: Ic25f9323486596fa2dc6d909cd568a37ab84dd5e

* [SW-191415] update fp8 maxAbs observer  using torch.copy_

Change-Id: I3923c832f9a8a2b14e392f3f4719d233a457702f

* [SW-184943] Enhance INC WOQ model loading

- Support loading huggingface WOQ model
- Abstract WeightOnlyLinear base class. Add INCWeightOnlyLinear and HPUWeighOnlyLinear subclasses
- Load woq linear weight module by module
- Save hpu format tensor to reuse it once load it again

Change-Id: I679a42759b49e1f45f52bbb0bdae8580a23d0bcf

* [SW-190303] Implement HPUWeightOnlyLinear class in INC

Change-Id: Ie05c8787e708e2c3559dce24ef0758d6c498ac41

* [SW-192809] fix json_file bug when instantiating FP8Config class

Change-Id: I4a715d0a706efe20ccdb49033755cabbc729ccdc
Signed-off-by: Zhou Yuwen <[email protected]>

* [SW-192931] align setup.py with github INC and remove fp8_convert

Change-Id: Ibbc157646cfcfad64b323ecfd96b9bbda5ba9e2f
Signed-off-by: xinhe3 <[email protected]>

* [SW-192917] Update all HQT logic files with pre-commit check

Change-Id: I119dc8578cb10932fd1a8a674a8bdbf61f978e42
Signed-off-by: xinhe3 <[email protected]>

* update docstring

Signed-off-by: yuwenzho <[email protected]>

* add fp8 example and document (#1639)

Signed-off-by: xinhe3 <[email protected]>

* Update settings to be compatible with gerrit

* enhance ut

Signed-off-by: yuwenzho <[email protected]>

* move fp8 sample to helloworld folder

Signed-off-by: yuwenzho <[email protected]>

* update torch version of habana docker

Signed-off-by: xinhe3 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update readme demo

Signed-off-by: xinhe3 <[email protected]>

* update WeightOnlyLinear to INCWeightOnlyLinear

Signed-off-by: xinhe3 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add docstring for FP8Config

Signed-off-by: xinhe3 <[email protected]>

* fix pylint

Signed-off-by: xinhe3 <[email protected]>

* update fp8 test scripts

Signed-off-by: chensuyue <[email protected]>

* delete deps

Signed-off-by: chensuyue <[email protected]>

* update container into v1.17.0

Signed-off-by: chensuyue <[email protected]>

* update docker version

Signed-off-by: xinhe3 <[email protected]>

* update pt ut

Signed-off-by: chensuyue <[email protected]>

* add lib path

Signed-off-by: chensuyue <[email protected]>

* fix dir issue

Signed-off-by: xinhe3 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update fp8 test scope

Signed-off-by: chensuyue <[email protected]>

* fix typo

Signed-off-by: xinhe3 <[email protected]>

* update fp8 test scope

Signed-off-by: chensuyue <[email protected]>

* update pre-commit-ci

Signed-off-by: chensuyue <[email protected]>

* work around for hpu

Signed-off-by: xinhe3 <[email protected]>

* fix UT

Signed-off-by: xinhe3 <[email protected]>

* fix parameter

Signed-off-by: chensuyue <[email protected]>

* omit some test

Signed-off-by: chensuyue <[email protected]>

* update main page example to llm loading

Signed-off-by: xinhe3 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix autotune

Signed-off-by: xinhe3 <[email protected]>

---------

Signed-off-by: Zhou Yuwen <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: yuwenzho <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: yan tomsinsky <[email protected]>
Co-authored-by: Ron Ben Moshe <[email protected]>
Co-authored-by: Uri Livne <[email protected]>
Co-authored-by: Danny Semiat <[email protected]>
Co-authored-by: smarkovichgolan <[email protected]>
Co-authored-by: Dudi Lester <[email protected]>

* update main page (#1973)

Signed-off-by: chensuyue <[email protected]>

* fix online doc search issue (#1975)

Co-authored-by: ZhangJianyu <[email protected]>

* bump main version into v3.1 (#1974)

Signed-off-by: chensuyue <[email protected]>

* update readme for fp8 (#1979)

Signed-off-by: xinhe3 <[email protected]>

* Skip some tests for torch 2.4 (#1981)

Signed-off-by: yiliu30 <[email protected]>

* Fix UT env and upgrade torch to 2.4.0 (#1978)

Signed-off-by: Sun, Xuehao <[email protected]>

* support gptq `true_sequential` and `quant_lm_head` (#1977)

Signed-off-by: Kaihui-intel <[email protected]>

* update installation and ci test for 3x api (#1991)

Signed-off-by: chensuyue <[email protected]>

* add hasattr check for torch fp8 dtype (#1985)

Signed-off-by: xin3he <[email protected]>

* add quantize, save, load function for transformers-like api (#1986)

Signed-off-by: changwangss <[email protected]>

* Update installation_guide.md (#1989)

Correct typo in installation doc

* update 3x pt binary build (#1992)

Signed-off-by: chensuyue <[email protected]>

* add per_channel_minmax (#1990)

Signed-off-by: yiliu30 <[email protected]>

* Remove the save of gptq config (#1993)

Signed-off-by: Kaihui-intel <[email protected]>

* Add recent publications (#1995)

* add recent publications

Signed-off-by: Huang, Tai <[email protected]>

* update total count

Signed-off-by: Huang, Tai <[email protected]>

---------

Signed-off-by: Huang, Tai <[email protected]>

* update docker image prune rules (#2003)

Signed-off-by: chensuyue <[email protected]>

* Support transformers-like api for woq quantization (#1987)


Signed-off-by: Kaihui-intel <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Wang, Chang <[email protected]>

* add INC_FORCE_DEVICE introduction (#1988)

* add INC_FORCE_DEVICE introduction

Signed-off-by: xin3he <[email protected]>

* Update PyTorch.md

* Update PyTorch.md

* Update docs/source/3x/PyTorch.md

Co-authored-by: Yi Liu <[email protected]>

* rename to INC_TARGET_DEVICE

Signed-off-by: xin3he <[email protected]>

---------

Signed-off-by: xin3he <[email protected]>
Co-authored-by: Yi Liu <[email protected]>

* Replace FORCE_DEVICE with INC_TARGET_DEVICE [transformers] (#2005)

Signed-off-by: Kaihui-intel <[email protected]>

* enable auto_round format export (#2002)

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* remove accelerate version in unit test (#2007)

Signed-off-by: Sun, Xuehao <[email protected]>

* add repack_awq_to_optimum_format function (#1998)

Signed-off-by: changwangss <[email protected]>

* Update auto_round requirements for transformers example (#2013)

Signed-off-by: Kaihui-intel <[email protected]>

* add pad_to_buckets in evaluation for hpu performance (#2011)

* add pad_to_buckets in evaluation for hpu performance
---------

Signed-off-by: xin3he <[email protected]>

* Update model accuracy (#2006)

Signed-off-by: Sun, Xuehao <[email protected]>

* fix xpu device set weight and bias (#2010)

Signed-off-by: changwangss <[email protected]>
Co-authored-by: Sun, Xuehao <[email protected]>

* Add transformers-like api doc (#2018)

Signed-off-by: Kaihui-intel <[email protected]>

* Adapt transformers 4.45.1 (#2019)

Signed-off-by: Kaihui-intel <[email protected]>
Co-authored-by: changwangss <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add autoround EMNLP24 to pub list (#2014)

Signed-off-by: Huang, Tai <[email protected]>

* Fix transformers rtn layer-wise quant (#2008)

Signed-off-by: Kaihui-intel <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Remove itrex dependency for 3x example (#2016)

Signed-off-by: Kaihui-intel <[email protected]>
Co-authored-by: Sun, Xuehao <[email protected]>

* add transformers-like api link in readme (#2022)

Signed-off-by: Huang, Tai <[email protected]>

* Add woq examples (#1982)

Signed-off-by: Kaihui-intel <[email protected]>
Signed-off-by: Sun, Xuehao <[email protected]>
Co-authored-by: Sun, Xuehao <[email protected]>

* remove ITREX unit test CI (#2021)

Signed-off-by: Sun, Xuehao <[email protected]>

* Support quant procedure on XPU (#2026)

Signed-off-by: Kaihui-intel <[email protected]>

* Support generation search for transformers examples (#2029)


Signed-off-by: Kaihui-intel <[email protected]>

* Remove itrex dependency for 2x example  (#2024)

Signed-off-by: Kaihui-intel <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update the PT2E CV example (#2032)

Signed-off-by: yiliu30 <[email protected]>

* Cherry pick Habana software 1.18.0 update (#2025)

Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Signed-off-by: Sun, Xuehao <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: yan tomsinsky <[email protected]>
Co-authored-by: Uri Livne <[email protected]>
Co-authored-by: Dudi Lester <[email protected]>
Co-authored-by: Danny <[email protected]>
Co-authored-by: Tomer Gafni <[email protected]>
Co-authored-by: Eran Geva <[email protected]>
Co-authored-by: Daniel Ohayon <[email protected]>
Co-authored-by: Roi Tiefenbrunn <[email protected]>
Co-authored-by: Kamil Felskowski <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update gaudi version mapping table for v3.1 (#2030)

Signed-off-by: Huang, Tai <[email protected]>
Co-authored-by: chen, suyue <[email protected]>

* fix broken link to FP8 example (#2034)

Signed-off-by: Huang, Tai <[email protected]>

* add back missing image (#2035)

Signed-off-by: xin3he <[email protected]>

* Add vlm examples, bugfix (#2012)

* add VLM examples

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* bugfix, add utils

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix docstring issues

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bugfix

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine examples

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* fix scan issue

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine shell

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* refine scripts & requirements

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* typofix

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* refine docs

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* set attn_implementation for Phi3-vision

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* refine phi3 example

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix code coverage

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update config

Signed-off-by: Sun, Xuehao <[email protected]>

* refine shells, docs and example. enable qwen2-vl quantization

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix ci

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* fix EOF error

Signed-off-by: Sun, Xuehao <[email protected]>

* update qwen dir

Signed-off-by: Sun, Xuehao <[email protected]>

* refine shell, add llama3.2 inference to doc

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* bugfix

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bugfix

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* bugfix

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* refine eval shell

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* fix eval device issue

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* refine eval dtype

Signed-off-by: Zhang, Weiwei1 <[email protected]>

---------

Signed-off-by: Zhang, Weiwei1 <[email protected]>
Signed-off-by: Sun, Xuehao <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sun, Xuehao <[email protected]>

* remove autoround limit (#2036)

Signed-off-by: Sun, Xuehao <[email protected]>

* Adapt autoround format (#2038)

Signed-off-by: Kaihui-intel <[email protected]>

* remove transformers import from utility (#2045)

* remove transformers import from utility

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* bugfix

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixtypos

Signed-off-by: Zhang, Weiwei1 <[email protected]>

---------

Signed-off-by: Zhang, Weiwei1 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add buckets setting for lm_eval (#2044)

* add buckets setting for lm_eval

Signed-off-by: xinhe3 <[email protected]>

* clear graph cache to avoid OOM

Signed-off-by: xinhe3 <[email protected]>

---------

Signed-off-by: xinhe3 <[email protected]>
Co-authored-by: xinhe3 <[email protected]>

* Enhance example for HPU performance (#2043)

* Enhance example for HPU performance

Signed-off-by: xinhe3 <[email protected]>

* Update run_clm_no_trainer.py

* remove wikitext to avoid oom for llama2-7b bs=8

* remove wikitext

Signed-off-by: xinhe3 <[email protected]>

---------

Signed-off-by: xinhe3 <[email protected]>
Co-authored-by: xinhe3 <[email protected]>

* remove useless code in setup.py (#2046)

* Update the default PT2E config (#2041)

Signed-off-by: yiliu30 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Support non-contiguous weight saving (#2049)

Signed-off-by: Kaihui-intel <[email protected]>

* fix GPTQ oom issue on HPU (#2042)

* fix GPTQ oom issue on HPU

Signed-off-by: xinhe3 <[email protected]>

---------

Signed-off-by: xinhe3 <[email protected]>
Co-authored-by: xinhe3 <[email protected]>

* fix bug and update readme (#2051)

* fix bug and update readme

---------

Signed-off-by: xinhe3 <[email protected]>
Co-authored-by: xinhe3 <[email protected]>

* Support safetensors loading for layerwise (#2047)

Signed-off-by: Kaihui-intel <[email protected]>

* Enhance WOQ example Readme and help (#2053)


Signed-off-by: Kaihui-intel <[email protected]>
Co-authored-by: xinhe <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* improve optimum-habana available check (#2054)

Signed-off-by: changwang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed CI IPEX version (#2061)

Signed-off-by: Sun, Xuehao <[email protected]>

* Update torch config kwargs (#2055)

Signed-off-by: Kaihui-intel <[email protected]>

* Support client `use_layer_wise` setting (#2048)

Signed-off-by: Kaihui-intel <[email protected]>

* Check autoround before import it (#2062)

Signed-off-by: yiliu30 <[email protected]>

* Delete fp8_quant/scripts/regression_detection directory (#2059)

A missed change when cherry-picking Habana software 1.18.0

* Make PatchedVLLMKVCache resiliant to forward API changes (#2067)

Change-Id: I33fad5c3e80e017099f300782809f24669765d42

Co-authored-by: Konrad Zawora <[email protected]>

* Fix glm-4-9b oom issue on BMG

Signed-off-by: Kaihui-intel <[email protected]>

* Update recipes & Bump version to 3.2 (#2037)

Signed-off-by: Sun, Xuehao <[email protected]>

* Docs: Add customer defined calibration and update docker run (#2057)

Signed-off-by: fengding <[email protected]>

* Adapt torch and ipex 2.5 (#2066)

Signed-off-by: Kaihui-intel <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sun, Xuehao <[email protected]>

* Enhance `TBB` check (#2068)

Signed-off-by: yiliu30 <[email protected]>

* Fix the PT2E UT (#2071)

Signed-off-by: yiliu30 <[email protected]>

* Support gptq layerwise on client (#2069)

Signed-off-by: Kaihui-intel <[email protected]>

* Adapt autoround v0.4 (#2073)

Signed-off-by: Kaihui-intel <[email protected]>

* Ensure that mul operators with shared initializer will not be absorbed in SmoothQuant (#2063)

Signed-off-by: duansheng.liu <[email protected]>

* Integrate AutoRound v0.4 [3x] (#2072)

Signed-off-by: Kaihui-intel <[email protected]>
Signed-off-by: Zhang, Weiwei1 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update CI framework versions and README badge for release 3.1.1 (#2058)

Signed-off-by: Sun, Xuehao <[email protected]>

* Remove the examples force required torch 1.13.1  (#2074)

* remove alexnet_fashion_mnist notebook

Signed-off-by: chensuyue <[email protected]>

* remove rnnt in pytorch examples

Signed-off-by: chensuyue <[email protected]>

---------

Signed-off-by: chensuyue <[email protected]>

* Fix truthfulqa task evaluation issue

Signed-off-by: Kaihui-intel <[email protected]>

* Add required library for ONNX example (#2078)

* Add required library for ONNX example

* Update requirements.txt

* support autoround new API for VLM (#2075)

Signed-off-by: Zhang, Weiwei1 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add import check (#2076)

Signed-off-by: Sun, Xuehao <[email protected]>

* Update utility.py (#2079)

* Add gptq known issue (#2080)


Signed-off-by: Kaihui-intel <[email protected]>

* Fix sdxl `q_unet` config (#2081)

Signed-off-by: Kaihui-intel <[email protected]>

* Fixed the PT2E LLM example (#2082)

Signed-off-by: yiliu30 <[email protected]>

* fix dlrm when using incbench (#2084)

Signed-off-by: Xin He <[email protected]>

* add mapping for v3.2 (#2085)

Signed-off-by: Huang, Tai <[email protected]>

* [SW-192753] unify StaticQuantConfig and FP8Config

Change-Id: I2fe09ba4c575810a5b130268d63b9eee926bdf08
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: Xin He <[email protected]>

* [SW-200124] Set Scalar as default scale format + Compatibility check

Set ScaleFormat.SCALAR as default value of 'scale_format'
Add reduction of 'scale_format' to 'CONST' value if using a PCQ scale_format or fake_quant
Add test to show Scalar models aren't giving wrong outputs
Fix fakequant test as it is problematic use of 'hpu_initialize' and should be fixed in SW-202697

Change-Id: I43ff4900e9e02ce7f50edcdbb19a28f4f615ef9c
Signed-off-by: Xin He <[email protected]>

* [SW-201679] support unit_scales for FuseMoE

Change-Id: I02a63332bc09f1f6cdc3f133dd5f58829fcbad5a
Signed-off-by: Xin He <[email protected]>

* [SW-203698] Add log for converting prepared model

Change-Id: I1464f11bbab27d9041c9ba6f448e5ae6fa43bc2d
Signed-off-by: Mengni Wang <[email protected]>

* [SW-199737] Measurement dump improvements

Add _validate_dump_path to make sure dump dir is writable and backup measurements

Change-Id: Ib64abe772b4c309bbf04de89477cde92ea47ade4

* [SW-203452] Fixing and temp skipping G3 unittests

Change-Id: Iafa4a6a8577724bd8a86581bfe38d3269dab2ea2
Signed-off-by: Xin He <[email protected]>

* [SW-195965] [GPTQ] INC load model loads model in fp32 only

Change-Id: I597d19273786c0c169ad952ebe5a357274e358dc
Signed-off-by: xinhe3 <[email protected]>

* [SW-204016] Enable scale calculation with disk offload in INC

-move calculating scales and quantization config info during the module
patching loop as the weights there guaranteed to be on cpu.

Change-Id: Ifb2de4e67c1b36c611dcc50b4cd14731b0336c50

* [SW-202614] Llama70b int4 gptq with INC load flow - getting host OOM

Change-Id: Id1797371bb136502d89c4e8d17abcac1eaac4534
Signed-off-by: xinhe3 <[email protected]>

* [SW-199823] [HQT] fix INC one-step quantization API workflow

1. fix test_fp8_static_quant.py::TestFP8StaticQuant::test_one_step_quant_cv failure by deepcoping forward function in common.py
2. fix config.py: Object of type dict_keys is not JSON serializable by converting it to list
3. fix download issue of UT by using local tiny_gptj.json

Change-Id: I2ad3eac411e8fca9d88a021f6a5b9594e6c75ae9
Signed-off-by: xinhe3 <[email protected]>

* [SW-202617] vllm mixtral MoE quant and measure using forward call

Change-Id: I919f1e3597b6c95c3fc60db78ac9c0c06242b416
Signed-off-by: Xin He <[email protected]>

* [SW-200092] Allow fsdpa and softmax to use scalar scales in INC

Change-Id: Ieba4c74c18624fb0c5fce6321671d6f4eb2b8c93
Signed-off-by: Xin He <[email protected]>

* [SW-205363] Update _load_state_dict_into_meta_model

Update _load_state_dict_into_meta_model to compatible with
Transformer 4.45 release

Change-Id: Ib5d8ca777d38c7ae225b7174a886b333b6246ab1
Signed-off-by: Xin He <[email protected]>

* [SW-184948] INC Q/DQ optimization, included conv2d, kv_cache, fsdpa,
softmax and other operators.

Change-Id: I920f8ad85b3493f1bd4bbe770533343e214fc2d1
Signed-off-by: changwang <[email protected]>
Signed-off-by: Xin He <[email protected]>

* [SW-198585] Fix typo causing PatchedVLLMKVCache error

Change-Id: Iafdcc935f702bc4756e2ba89935becb3bc47a728

* [SW-199208] QDQ Refactor for Registering Patched Modules, Scaling Methods, and Observers

1. Extension APIs
    - `PatchedModuleBase` , `register_patched_module`
    - `ScalingMethodBase`, `register_scaling_methods`
    - `ObserverBase` ``register_observer`, `register_module_config_for_observer`

    Related files:
    - fp8_quant/patched_module_base.py
    - fp8_quant/observer_base.py
    - fp8_quant/_core/measure.py
    - test_register_apis.py

2. Device-agnostic Patching
    - Replaced `hpu` with `cur_accelerator.name()`
    - Replaced `htcore.mark_step()` with `cur_accelerator.synchronize()`
    - Removed `torch.device("hpu")` under observers and scaling method
    - Updated `hpu_accelerator.synchronize()` to `htcore.mark_step()` + `torch.hpu.synchronize()`

Change-Id: I83c6de928a991ed2c1b3b434d372f49e095c38d3
Signed-off-by: Yi Liu <[email protected]>
Co-authored-by: Mengni Wang <[email protected]>
Signed-off-by: Xin He <[email protected]>

* [SW-203389] scalars scales doesn't provide dtype attribution

Change-Id: I4e40dc9b2d9cb65bc9e49571cd57a9ab030f5d7b
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: Xin He <[email protected]>

* [SW-199208] fix ModuleInfo conversion issue

Change-Id: Ib6c35e1623dda3e470e569defccd607a18b43312

* [SW-200168] Enable working with G2 HW scales on G3

Change-Id: I17f71540eb78e828f01f1a11c8b233d60951293e
Signed-off-by: Xin He <[email protected]>

* [SW-203389] fix get_scale_dtype to support PCQ scales

Change-Id: I923ace405a0f751a2e5a0a3aadb7abbb401a6c44

* [SW-199719] reduce PCQ scales memory usage

removed persistent full weight scales during PCQ quantization
instead we are keeping only the input and output channels scales
creating temporary full scale tensor on input quant Op call
since the full scale tensor is the same size as the orig bf16 weight
keeping all full scales persistently and the quntized weights will
result a quantized model that uses more memory than the unquantized.

Change-Id: Idc91c5ac8b9cfea2e2a3ad053cb4dc5464cff776

* [SW-206112] INC Q/DQ improvement - use Q/DQ ops

Change-Id: Ib03ea8744aa2cca8b606754c45944840da1c3898
Signed-off-by: changwang <[email protected]>
Signed-off-by: Xin He <[email protected]>

* [SW-206693] Convert conv2d_fp8 params to list if necessary

It's needed for the new approach to dynamic shapes in PT2.5.

Change-Id: I8d5e620153970b210675459e3d6aecad8ca7cbde

* [SW-207411] Add catch for OSError in _validate_dump_path

Change-Id: I82bae184257f3da982877b3797f2ee8b40a573c8

* [SW-207328] remove accuracy check due to random issue

Change-Id: Ifbd985c31c3755b6ab353ef8fa45e911dd75d688
Signed-off-by: xinhe3 <[email protected]>

* [SW-207559] Folder layout refactoring and cleanup (phase 1)

Change-Id: Ic9bffd2b7477d4530b4e2a5e411760a731efb84b
Signed-off-by: Yi Liu <[email protected]>
Signed-off-by: Xin He <[email protected]>

* [SW-193262] INC multi device save/load CP design in fp8 (#5)

Signed-off-by: Xin <[email protected]>
Signed-off-by: Xin He <[email protected]>

* [SW-208521] one-step quantization got double memory usage (#3)

* [SW-208521] one-step quantization got double memory usage

Signed-off-by: Xin <[email protected]>

* [SW-208789] Support quantizing FP16 model to FP8 (#15)

Since layer-wise is using memory mapping from disk, the model could be fp16 as it saved on disk, for example, llama2-7b.

We need to add logic to support this case to make sure layer-wise works well.

Signed-off-by: Xin He <[email protected]>

* [SW-205959] Update _load_state_dict_into_meta_model for model with bias (#7)

Signed-off-by: Xin <[email protected]>

* [SW-208700] release bf16 model memory on HPU in one-step quantization (#14)

Signed-off-by: Xin <[email protected]>
Signed-off-by: Xin He <[email protected]>

* [SW-197077] refactoring maxabs scales and adding arbitrary scales. (#12)

* [SW-197077] refactoring maxabs scales and adding arbitrary scales.

Change-Id: I2c35cf925b6b21983f1770db7d35e14f3d7d3e47

* [SW-197077] refactoring scale:
fix atol

Change-Id: I1c99ddd9ade679286988e7d8a96338b32c0ddc07

* [SW-197077]  adding arbitrary scales

* Skip autoround test for HPU (#19)

Change-Id: I6dc9724389c16a05252370b9e09a1db80bc8d696

Signed-off-by: Yi Liu <[email protected]>
Co-authored-by: Yi Liu <[email protected]>

* [SW-199728] [DeepSpeed] Buffers initialized by model are not correct … (#16)

* [SW-199728] [DeepSpeed] Buffers initialized by model are not correct after tensor parallel

---------

Signed-off-by: Xin <[email protected]>
Co-authored-by: Danny Semiat <[email protected]>
Signed-off-by: Xin He <[email protected]>

* [SW-208151] CD 1.19.0 - PT Docker - test_quantization No module named… (#33)

* [SW-209256] fix GPTQ oom issue on HPU (#2042) (#20)

* fix GPTQ oom issue on HPU (#2042)
---------

Signed-off-by: Xin <[email protected]>
Co-authored-by: xinhe3 <[email protected]>

* [SW-208151] CD 1.19.0 - PT Docker - test_quantization No module named 'safetensors'

Signed-off-by: Xin <[email protected]>

---------

Signed-off-by: Xin <[email protected]>
Co-authored-by: xinhe3 <[email protected]>
Co-authored-by: Danny Semiat <[email protected]>

* [SW-207748] Support Auto-round on HPU (#25)

Signed-off-by: Yi Liu <[email protected]>
Co-authored-by: Yi Liu <[email protected]>

* [SW-209878] Increase threshold to avoid random error in test_layer_wise.py (#36)

Signed-off-by: Xin He <[email protected]>
Co-authored-by: Xin He <[email protected]>

* [SW-207579] support load vLLM compatible FP8 model (#18)

Support load vLLM compatible FP8 model, both G2 and G3, both single card and multi-cards.
---------

Signed-off-by: changwang <[email protected]>

* [SW-207451] Implement block-wise calibration for LLM (#41)

* [SW-207451] Implement block-wise calibration for LLM

---------

Signed-off-by: Xin <[email protected]>
Co-authored-by: Xin He <[email protected]>
Signed-off-by: Xin He <[email protected]>

* [SW-208986] fix save&load bug (#40)

* [SW-208986] fix save&load bug

---------

Signed-off-by: Xin He <[email protected]>
Co-authored-by: Xin He <[email protected]>

* [SW-207748] Add Auto-round Example (#42)

* add autoround hpu example

Change-Id: Ibd537f4667c7c077160427722a5eca2c721aa5cd
Signed-off-by: Yi Liu <[email protected]>

* add requirements

Change-Id: I77a95ec05e41247db9903e8622c31f05259ca365
Signed-off-by: Yi Liu <[email protected]>

---------

Signed-off-by: Yi Liu <[email protected]>
Co-authored-by: Yi Liu <[email protected]>
Co-authored-by: Uri Livne <[email protected]>
Signed-off-by: Xin He <[email protected]>

* [SW-197077] fix bug (#47)

* [SW-210541] loading for fused_sdpa requires additional amax scale (#51)

Signed-off-by: Xin He <[email protected]>
Co-authored-by: Xin He <[email protected]>

* fix PatchedLoRACompatibleLinear init (#65)

Signed-off-by: changwangss <[email protected]>

* align files with v1.19.0 in fp8_quant folder

Signed-off-by: Xin He <[email protected]>

* fix missing SaveLoadFormat

Signed-off-by: Xin He <[email protected]>

* align and fix config after cherry-pick

Signed-off-by: Xin He <[email protected]>

* Implicit relative imports is abandoned

Signed-off-by: Xin He <[email protected]>

* fix config issue blocking CI

Signed-off-by: Xin He <[email protected]>

* remove synchronize for `pack_unpack_tensor_with_numpy` (#2070)

* remove pack&unpack synchronize

---------

Signed-off-by: Kaihui-intel <[email protected]>

* stop auto-fix of pre-commit

Signed-off-by: Xin He <[email protected]>

* update autoround example for release test

Signed-off-by: xin3he <[email protected]>

* fix AWQ&TEQ loading due to input scale

Signed-off-by: xin3he <[email protected]>

* fix HQQ state_dict loading caused by [SW-195965]

Signed-off-by: xin3he <[email protected]>

* use per_channel as default config (#2091)

Signed-off-by: yiliu30 <[email protected]>

* workaround transformers issue in version 4.47.0 (#2092)

* workaround transformers issue in version 4.47.0

Signed-off-by: xin3he <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Refactor FP8 pytest script (#2089)

* Refactor FP8 pytest script

---------

Signed-off-by: Sun, Xuehao <[email protected]>

* update ci scan scope

Signed-off-by: chensuyue <[email protected]>

* [SW-210500] [Optimum-Habana] [Regression] [fp8] [INC] No generated text for llava models [llava-1.5-7b-hf] [llava-1.5-13b-hf ] (#54)

Signed-off-by: Xin He <[email protected]>
Co-authored-by: Xin He <[email protected]>

* [SW-213236] resolve CPU mem issue in CI (#76)

Signed-off-by: Xin He <[email protected]>
Co-authored-by: Xin He <[email protected]>

* recover pre-commit

Signed-off-by: Xin He <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix `is_sharded` setting for loading quant model (#2094)

Signed-off-by: Kaihui-intel <[email protected]>

* fix error message for different python version (#2099)

Signed-off-by: changwangss <[email protected]>

* fix UT of RTN on HPU (#2098)

Signed-off-by: xin3he <[email protected]>
Signed-off-by: Sun, Xuehao <[email protected]>

* fix device issue during calibration (#2100)

Signed-off-by: Xin He <[email protected]>

* fix woq example and update document for v1.19.0 (#2097)

Signed-off-by: xin3he <[email protected]>

* Refactor version import paths to common module (#2095)

Signed-off-by: Sun, Xuehao <[email protected]>

* update CI gaudi-docker to 1.19.0 (#2096)

Signed-off-by: Sun, Xuehao <[email protected]>

* fix device mapping issue of llama gptq (#2101)

Signed-off-by: Xin He <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* remove fix_measurements.py. exists with a different name - postprocessing_vllm_measurements.py

* fix merge

* remove unused imported functions with wrong path

* change envar requested value from 1 to true

---------

Signed-off-by: Cheng, Zixuan <[email protected]>
Signed-off-by: zehao-intel <[email protected]>
Signed-off-by: xin3he <[email protected]>
Signed-off-by: Kaihui-intel <[email protected]>
Signed-off-by: Sun, Xuehao <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: He, Xin3 <[email protected]>
Signed-off-by: changwangss <[email protected]>
Signed-off-by: Huang, Tai <[email protected]>
Signed-off-by: violetch24 <[email protected]>
Signed-off-by: Mengni Wang <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Zhou Yuwen <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: yuwenzho <[email protected]>
Signed-off-by: xinhe3 <[email protected]>
Signed-off-by: Zhang, Weiwei1 <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Signed-off-by: changwang <[email protected]>
Signed-off-by: fengding <[email protected]>
Signed-off-by: duansheng.liu <[email protected]>
Signed-off-by: Xin He <[email protected]>
Signed-off-by: Mengni Wang <[email protected]>
Signed-off-by: Xin <[email protected]>
Signed-off-by: changwangss <[email protected]>
Co-authored-by: Zixuan Cheng <[email protected]>
Co-authored-by: xinhe <[email protected]>
Co-authored-by: zehao-intel <[email protected]>
Co-authored-by: Kaihui-intel <[email protected]>
Co-authored-by: Sun, Xuehao <[email protected]>
Co-authored-by: chen, suyue <[email protected]>
Co-authored-by: Yi Liu <[email protected]>
Co-authored-by: Dina Suehiro Jones <[email protected]>
Co-authored-by: Wang, Chang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Huang, Tai <[email protected]>
Co-authored-by: violetch24 <[email protected]>
Co-authored-by: Wang, Mengni <[email protected]>
Co-authored-by: Neo Zhang Jianyu <[email protected]>
Co-authored-by: ZhangJianyu <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: yan tomsinsky <[email protected]>
Co-authored-by: Ron Ben Moshe <[email protected]>
Co-authored-by: Uri Livne <[email protected]>
Co-authored-by: Danny Semiat <[email protected]>
Co-authored-by: smarkovichgolan <[email protected]>
Co-authored-by: Dudi Lester <[email protected]>
Co-authored-by: Yi Liu <[email protected]>
Co-authored-by: WeiweiZhang1 <[email protected]>
Co-authored-by: Tomer Gafni <[email protected]>
Co-authored-by: Eran Geva <[email protected]>
Co-authored-by: Daniel Ohayon <[email protected]>
Co-authored-by: Roi Tiefenbrunn <[email protected]>
Co-authored-by: Kamil Felskowski <[email protected]>
Co-authored-by: xinhe3 <[email protected]>
Co-authored-by: Konrad Zawora <[email protected]>
Co-authored-by: feng-intel <[email protected]>
Co-authored-by: duanshengliu <[email protected]>
Co-authored-by: Mengni Wang <[email protected]>
Co-authored-by: Jimin Ha <[email protected]>
Co-authored-by: changwang <[email protected]>
Co-authored-by: Yi Liu <[email protected]>
Co-authored-by: Amadeusz Skrzypczak <[email protected]>
Co-authored-by: Linoy Buchnik <[email protected]>
  • Loading branch information
Show file tree
Hide file tree
Showing 154 changed files with 1,896 additions and 5,853 deletions.
1 change: 1 addition & 0 deletions .azure-pipelines/code-scan.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ pr:
- requirements.txt
- .azure-pipelines/code-scan.yml
- .azure-pipelines/scripts/codeScan
- .azure-pipelines/template/docker-template.yml

pool:
vmImage: "ubuntu-latest"
Expand Down
53 changes: 0 additions & 53 deletions .azure-pipelines/docker/DockerfileWithNC.devel

This file was deleted.

3 changes: 2 additions & 1 deletion .azure-pipelines/model-test-3x.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ pr:
- requirements_pt.txt
- .azure-pipelines/scripts/models
- .azure-pipelines/model-test-3x.yml
- .azure-pipelines/template/docker-template.yml

variables:
OUT_SCRIPT_PATH: $(Build.SourcesDirectory)/.azure-pipelines/scripts/models
Expand All @@ -30,7 +31,7 @@ parameters:
type: object
default:
- opt_125m_woq_gptq_int4
- opt_125m_woq_gptq_int4_dq_bnb
- opt_125m_woq_gptq_nf4_dq_bnb
- opt_125m_woq_gptq_int4_dq_ggml

stages:
Expand Down
4 changes: 4 additions & 0 deletions .azure-pipelines/model-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,13 @@ pr:
- neural_compressor
- setup.py
- requirements.txt
- .azure-pipelines/model-test.yml
- .azure-pipelines/template/docker-template.yml
- .azure-pipelines/scripts/models
- examples/tensorflow/oob_models/quantization/ptq
- .azure-pipelines/model-test.yml
- .azure-pipelines/scripts/fwk_version.sh
- .azure-pipelines/scripts/install_nc.sh
exclude:
- test
- neural_compressor/common
Expand Down
10 changes: 5 additions & 5 deletions .azure-pipelines/scripts/fwk_version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

echo "export FWs version..."
export tensorflow_version='2.15.0-official'
export pytorch_version='2.4.0+cpu'
export torchvision_version='0.19.0'
export ipex_version='2.4.0+cpu'
export onnx_version='1.16.0'
export onnxruntime_version='1.18.0'
export pytorch_version='2.5.1+cpu'
export torchvision_version='0.20.1'
export ipex_version='2.5.0+cpu'
export onnx_version='1.17.0'
export onnxruntime_version='1.20.0'
export mxnet_version='1.9.1'
6 changes: 4 additions & 2 deletions .azure-pipelines/scripts/install_nc.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

echo -e "\n Install Neural Compressor ... "
echo -e "##[group]Install Neural Compressor ... "
cd /neural-compressor
if [[ $1 = *"3x_pt"* ]]; then
python -m pip install --no-cache-dir -r requirements_pt.txt
Expand All @@ -9,7 +9,8 @@ if [[ $1 = *"3x_pt"* ]]; then
python setup.py pt bdist_wheel
else
echo -e "\n Install torch CPU ... "
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cpu
pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu
python -m pip install intel-extension-for-pytorch==2.5.0 oneccl_bind_pt==2.5.0 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
python -m pip install --no-cache-dir -r requirements.txt
python setup.py bdist_wheel
fi
Expand All @@ -26,4 +27,5 @@ else
fi

echo -e "\n pip list after install Neural Compressor ... "
echo "##[endgroup]"
pip list
4 changes: 2 additions & 2 deletions .azure-pipelines/scripts/models/run_pytorch_models_trigger.sh
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,10 @@ elif [ "${model}" == "opt_125m_woq_gptq_int4" ]; then
model_src_dir="nlp/huggingface_models/language-modeling/quantization/weight_only"
inc_new_api=3x_pt
tuning_cmd="bash run_quant.sh --topology=opt_125m_woq_gptq_int4"
elif [ "${model}" == "opt_125m_woq_gptq_int4_dq_bnb" ]; then
elif [ "${model}" == "opt_125m_woq_gptq_nf4_dq_bnb" ]; then
model_src_dir="nlp/huggingface_models/language-modeling/quantization/weight_only"
inc_new_api=3x_pt
tuning_cmd="bash run_quant.sh --topology=opt_125m_woq_gptq_int4_dq_bnb"
tuning_cmd="bash run_quant.sh --topology=opt_125m_woq_gptq_nf4_dq_bnb"
elif [ "${model}" == "opt_125m_woq_gptq_int4_dq_ggml" ]; then
model_src_dir="nlp/huggingface_models/language-modeling/quantization/weight_only"
inc_new_api=3x_pt
Expand Down
9 changes: 8 additions & 1 deletion .azure-pipelines/scripts/ut/3x/run_3x_pt.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,19 @@ python -c "import neural_compressor as nc"
test_case="run 3x Torch"
echo "${test_case}"

echo "##[section]Run import check"
set -e
python -c "import neural_compressor.torch"
python -c "import neural_compressor.common"
echo "##[section]import check pass"

# install requirements
echo "set up UT env..."
echo "##[group]set up UT env..."
export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH
pip install -r /neural-compressor/test/3x/torch/requirements.txt
pip install pytest-cov
pip install pytest-html
echo "##[endgroup]"
pip list

export COVERAGE_RCFILE=/neural-compressor/.azure-pipelines/scripts/ut/3x/coverage.3x_pt
Expand Down
23 changes: 22 additions & 1 deletion .azure-pipelines/scripts/ut/3x/run_3x_pt_fp8.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,14 @@ python -c "import neural_compressor as nc"
test_case="run 3x Torch Habana FP8"
echo "${test_case}"

echo "##[section]Run import check"
set -e
python -c "import neural_compressor.torch"
python -c "import neural_compressor.common"
echo "##[section]import check pass"

# install requirements
echo "set up UT env..."
echo "##[group]set up UT env..."
export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH
sed -i '/^intel_extension_for_pytorch/d' /neural-compressor/test/3x/torch/requirements.txt
sed -i '/^auto_round/d' /neural-compressor/test/3x/torch/requirements.txt
Expand All @@ -13,6 +19,7 @@ pip install -r /neural-compressor/test/3x/torch/requirements.txt
pip install pytest-cov
pip install pytest-html
pip install pytest-html-merger
echo "##[endgroup]"
pip list

export COVERAGE_RCFILE=/neural-compressor/.azure-pipelines/scripts/ut/3x/coverage.3x_pt_fp8
Expand All @@ -28,6 +35,18 @@ pytest --cov="${inc_path}" -vs --disable-warnings --html=report_2.html --self-co
pytest --cov="${inc_path}" -vs --disable-warnings --html=report_4.html --self-contained-html torch/quantization/fp8_quant 2>&1 | tee -a ${ut_log_name}
pytest --cov="${inc_path}" -vs --disable-warnings --html=report_5.html --self-contained-html torch/algorithms/fp8_quant 2>&1 | tee -a ${ut_log_name}

# Below folder contains some special configuration for pytest so we need to enter the path and run it separately
cd /neural-compressor/test/3x/torch/algorithms/fp8_quant
pytest --cov="${inc_path}" -vs --disable-warnings --html=report_4.html --self-contained-html . 2>&1 | tee -a ${ut_log_name}
cp .coverage ${LOG_DIR}/.coverage.algo_fp8
cd - && mv /neural-compressor/test/3x/torch/algorithms/fp8_quant/*.html .

# Below folder contains some special configuration for pytest so we need to enter the path and run it separately
cd /neural-compressor/test/3x/torch/quantization/fp8_quant
pytest --cov="${inc_path}" -vs --disable-warnings --html=report_5.html --self-contained-html . 2>&1 | tee -a ${ut_log_name}
cp .coverage ${LOG_DIR}/.coverage.quant_fp8
cd - && mv /neural-compressor/test/3x/torch/quantization/fp8_quant/*.html .

mkdir -p report && mv *.html report
pytest_html_merger -i ./report -o ./report.html
cp report.html ${LOG_DIR}/
Expand All @@ -40,5 +59,7 @@ fi

# if ut pass, collect the coverage file into artifacts
cp .coverage ${LOG_DIR}/.coverage
cd ${LOG_DIR}
coverage combine .coverage.*

echo "UT finished successfully! "
9 changes: 8 additions & 1 deletion .azure-pipelines/scripts/ut/3x/run_3x_tf.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,19 @@ python -c "import neural_compressor as nc"
test_case="run 3x TensorFlow"
echo "${test_case}"

echo "##[section]Run import check"
set -e
python -c "import neural_compressor.tensorflow"
python -c "import neural_compressor.common"
echo "##[section]import check pass"

# install requirements
echo "set up UT env..."
echo "##[group]set up UT env..."
pip install -r /neural-compressor/test/3x/tensorflow/requirements.txt
pip install pytest-cov
pip install pytest-html
pip install pytest-html-merger
echo "##[endgroup]"
pip list

export COVERAGE_RCFILE=/neural-compressor/.azure-pipelines/scripts/ut/3x/coverage.3x_tf
Expand Down
6 changes: 4 additions & 2 deletions .azure-pipelines/scripts/ut/collect_log.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ coverage_log_base="/neural-compressor/log_dir/coverage_log_base"
coverage_compare="/neural-compressor/log_dir/coverage_compare.html"
cd /neural-compressor/log_dir

$BOLD_YELLOW && echo "collect coverage for PR branch" && $RESET
$BOLD_YELLOW && echo "##[group]collect coverage for PR branch" && $RESET
mkdir -p coverage_PR
cp ut_*_coverage/.coverage.* ./coverage_PR/

Expand All @@ -28,8 +28,9 @@ git checkout master
rm -rf build dist *egg-info
echo y | pip uninstall neural-compressor
cd /neural-compressor/.azure-pipelines-pr/scripts && bash install_nc.sh
echo "##[endgroup]"

$BOLD_YELLOW && echo "collect coverage for baseline" && $RESET
$BOLD_YELLOW && echo "##[group]collect coverage for baseline" && $RESET
coverage erase
cd /neural-compressor/log_dir
mkdir -p coverage_base
Expand All @@ -43,6 +44,7 @@ coverage report -m --rcfile=${COVERAGE_RCFILE} | tee ${coverage_log_base}
coverage html -d log_dir/coverage_base/htmlcov --rcfile=${COVERAGE_RCFILE}
coverage xml -o log_dir/coverage_base/coverage.xml --rcfile=${COVERAGE_RCFILE}
ls -l log_dir/coverage_base/htmlcov
echo "##[endgroup]"

get_coverage_data() {
# Input argument
Expand Down
5 changes: 3 additions & 2 deletions .azure-pipelines/scripts/ut/env_setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ echo "onnxruntime version is $onnxruntime_version"
echo "mxnet version is $mxnet_version"

test_case=$1
echo "========= test case is ${test_case}"
echo -e "##[group]test case is ${test_case}"

if [[ "${tensorflow_version}" == *"-official" ]]; then
pip install tensorflow==${tensorflow_version%-official}
Expand Down Expand Up @@ -100,6 +100,8 @@ pip install coverage
pip install pytest
pip install pytest-html

echo "##[endgroup]"

pip list
echo "[DEBUG] list pipdeptree..."
pip install pipdeptree
Expand All @@ -112,4 +114,3 @@ if [[ $(echo "${test_case}" | grep -c "run basic api") != 0 ]] || [[ $(echo "${t
find . -name "test*.py" | xargs sed -i 's/import tensorflow.compat.v1 as tf/import torch; import tensorflow.compat.v1 as tf/g'
find . -name "test*.py" | xargs sed -i 's/from tensorflow import keras/import torch; from tensorflow import keras/g'
fi

1 change: 1 addition & 0 deletions .azure-pipelines/scripts/ut/run_basic_adaptor.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ source /neural-compressor/.azure-pipelines/scripts/fwk_version.sh $1

echo "set up UT env..."
bash /neural-compressor/.azure-pipelines/scripts/ut/env_setup.sh "${test_case}"
export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH
export COVERAGE_RCFILE=/neural-compressor/.azure-pipelines/scripts/ut/coverage.file
lpot_path=$(python -c 'import neural_compressor; import os; print(os.path.dirname(neural_compressor.__file__))')
cd /neural-compressor/test || exit 1
Expand Down
5 changes: 3 additions & 2 deletions .azure-pipelines/template/docker-template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ steps:
- ${{ if eq(parameters.imageSource, 'pull') }}:
- script: |
docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
docker pull vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest
displayName: "Pull habana docker image"
- script: |
Expand All @@ -95,7 +95,8 @@ steps:
else
docker run -dit --disable-content-trust --privileged --name=${{ parameters.containerName }} --shm-size="2g" \
--runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host \
-v ${BUILD_SOURCESDIRECTORY}:/neural-compressor vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
-v ${BUILD_SOURCESDIRECTORY}:/neural-compressor vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest
docker exec ${{ parameters.containerName }} bash -c "ln -sf \$(which python3) /usr/bin/python"
fi
echo "Show the container list after docker run ... "
docker ps -a
Expand Down
10 changes: 9 additions & 1 deletion .azure-pipelines/ut-3x-pt-fp8.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,15 @@ pr:
paths:
include:
- .azure-pipelines/scripts/ut/3x/run_3x_pt_fp8.sh
- .azure-pipelines/scripts/install_nc.sh
- .azure-pipelines/ut-3x-pt-fp8.yml
- .azure-pipelines/template/docker-template.yml
- neural_compressor/common
- neural_compressor/torch
- test/3x/torch/algorithms/fp8_quant
- test/3x/torch/quantization/fp8_quant
- test/3x/torch/quantization/weight_only/test_rtn.py
- test/3x/torch/quantization/weight_only/test_load.py
- setup.py
- requirements_pt.txt

Expand Down Expand Up @@ -85,7 +89,7 @@ stages:

- script: |
echo "--- create container ---"
docker run -d -it --name="collectLogs" -v ${BUILD_SOURCESDIRECTORY}:/neural-compressor ${IMAGE_NAME}:${IMAGE_TAG} /bin/bash
docker run -d -it --name="collectLogs" -v ${BUILD_SOURCESDIRECTORY}:/neural-compressor ${IMAGE_NAME}:${IMAGE_TAG} /bin/bash
echo "--- docker ps ---"
docker ps
echo "--- collect logs ---"
Expand All @@ -94,6 +98,10 @@ stages:
&& bash ut/3x/collect_log_3x.sh 3x_pt_fp8"
displayName: "Collect UT Coverage"
- task: PublishCodeCoverageResults@2
inputs:
summaryFileLocation: $(Build.SourcesDirectory)/log_dir/coverage_PR/coverage.xml

- task: PublishPipelineArtifact@1
condition: succeededOrFailed()
inputs:
Expand Down
9 changes: 8 additions & 1 deletion .azure-pipelines/ut-3x-pt.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ pr:
- test/3x/common
- setup.py
- requirements_pt.txt
- .azure-pipelines/ut-3x-pt.yml
- .azure-pipelines/template/docker-template.yml
- .azure-pipelines/scripts/install_nc.sh
- .azure-pipelines/scripts/ut/3x/run_3x_pt.sh

pool: ICX-16C
Expand Down Expand Up @@ -84,7 +87,7 @@ stages:

- script: |
echo "--- create container ---"
docker run -d -it --name="collectLogs" -v ${BUILD_SOURCESDIRECTORY}:/neural-compressor ${IMAGE_NAME}:${IMAGE_TAG} /bin/bash
docker run -d -it --name="collectLogs" -v ${BUILD_SOURCESDIRECTORY}:/neural-compressor ${IMAGE_NAME}:${IMAGE_TAG} /bin/bash
echo "--- docker ps ---"
docker ps
echo "--- collect logs ---"
Expand All @@ -93,6 +96,10 @@ stages:
&& bash ut/3x/collect_log_3x.sh 3x_pt"
displayName: "Collect UT Coverage"
- task: PublishCodeCoverageResults@2
inputs:
summaryFileLocation: $(Build.SourcesDirectory)/log_dir/coverage_PR/coverage.xml

- task: PublishPipelineArtifact@1
condition: succeededOrFailed()
inputs:
Expand Down
Loading

0 comments on commit be9adc2

Please sign in to comment.