Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typehint nits #826

Merged
merged 11 commits into from
Oct 16, 2024
Merged

Typehint nits #826

merged 11 commits into from
Oct 16, 2024

Conversation

kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Oct 7, 2024

No description provided.

Copy link

github-actions bot commented Oct 7, 2024

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

@kylesayrs kylesayrs changed the title Fix typehint import Typehint nits Oct 7, 2024
@kylesayrs kylesayrs marked this pull request as ready for review October 7, 2024 19:27
@kylesayrs kylesayrs self-assigned this Oct 7, 2024
model = model_args.model
# Load tokenizer
# distill TODO: support for different tokenizer for teacher?
tokenizer = model_args.tokenizer

if isinstance(model, str) or isinstance(model, PosixPath):
(teacher, model_path, model) = initialize_model_from_path(
(teacher, _model_path, model) = initialize_model_from_path(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model_path is unused. _model_path indicates that the variable is unused

@mgoin mgoin merged commit 3eacbb3 into main Oct 16, 2024
5 of 6 checks passed
@mgoin mgoin deleted the kylesayrs/fix-typehint branch October 16, 2024 23:00
kylesayrs added a commit that referenced this pull request Oct 23, 2024
Signed-off-by: Kyle Sayers <[email protected]>
dsikka added a commit that referenced this pull request Oct 24, 2024
* rename files to remove colons

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <[email protected]>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <[email protected]>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <[email protected]>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <[email protected]>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <[email protected]>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* 2of4

Signed-off-by: Kyle Sayers <[email protected]>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <[email protected]>

* rename test file

Signed-off-by: Kyle Sayers <[email protected]>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
kylesayrs added a commit that referenced this pull request Nov 7, 2024
* rename files to remove colons

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <[email protected]>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <[email protected]>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <[email protected]>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <[email protected]>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <[email protected]>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* 2of4

Signed-off-by: Kyle Sayers <[email protected]>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <[email protected]>

* rename test file

Signed-off-by: Kyle Sayers <[email protected]>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
dsikka added a commit that referenced this pull request Nov 7, 2024
* Implement iterative parameter updating

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Use weight parameter of linear layer (#836)

* use weight parameter of linear layer

* add weight attribute check

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Rename files to remove colons (#846)

* rename files to remove colons

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <[email protected]>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <[email protected]>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <[email protected]>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <[email protected]>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <[email protected]>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* 2of4

Signed-off-by: Kyle Sayers <[email protected]>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <[email protected]>

* rename test file

Signed-off-by: Kyle Sayers <[email protected]>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cover all 3.9-3.12 in commit testing (#864)

Co-authored-by: dhuangnm <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Add marlin-24 recipe/configs for e2e testing (#866)

* add marlin-24 recipe/configs for e2e testing

* update

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] onload during sparsity calculation (#862)

* onload during sparsity calculation

* fix sparsity

---------

Co-authored-by: Dipika <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Fix HFTrainer overloads (#869)

* add missing arguments

Signed-off-by: Kyle Sayers <[email protected]>

* names

Signed-off-by: Kyle Sayers <[email protected]>

* style

Signed-off-by: Kyle Sayers <[email protected]>

* named args all around

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Support Model Offloading Tied Tensors Patch (#872)

* update parameter of offloaded modules

Signed-off-by: Kyle Sayers <[email protected]>

* in place function

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>

* add advice about dealing with non-invertable hessians (#875)

Signed-off-by: Kyle Sayers <[email protected]>

* seed commit workflow (#877)

* seed commit workflow

Signed-off-by: andy-neuma <[email protected]>

* tickle

Signed-off-by: andy-neuma <[email protected]>

* let's give it a try

Signed-off-by: andy-neuma <[email protected]>

* whitespace

Signed-off-by: andy-neuma <[email protected]>

* delete unneeded workflow

Signed-off-by: andy-neuma <[email protected]>

* adjust trigger

Signed-off-by: andy-neuma <[email protected]>

---------

Signed-off-by: andy-neuma <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837)

* update functioon

* wip

* clean-up; fix imports

* clean-up

* more clean-up

* bug fix

* update for kvcache

* get kv_cache to work

* docstring

* fix comment

* fix condition for dynamic

* update

* update tests

* add observer tests

* add flake8 skip

* apply updated mse fixes

* fix import

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <[email protected]>

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <[email protected]>

* PR comments

* clean-up

* move hook check to observer call

* update

* separate out calibration step

---------

Co-authored-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* WIP, observer

Signed-off-by: Kyle Sayers <[email protected]>

* use minmax observer

Signed-off-by: Kyle Sayers <[email protected]>

* Bugfix get observer from name (#883)

Signed-off-by: Rahul Tuli <[email protected]>

* BugFix: Fix Sparsity Reload Testing (#882)

* fix

* fix remaining test cases

* add comments

* fix

Signed-off-by: Kyle Sayers <[email protected]>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <[email protected]>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <[email protected]>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <[email protected]>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* use user-specified observer

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: andy-neuma <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: Andy Linfoot <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Co-authored-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
kylesayrs added a commit that referenced this pull request Nov 19, 2024
* Implement iterative parameter updating

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Use weight parameter of linear layer (#836)

* use weight parameter of linear layer

* add weight attribute check

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Rename files to remove colons (#846)

* rename files to remove colons

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <[email protected]>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <[email protected]>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <[email protected]>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <[email protected]>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <[email protected]>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* 2of4

Signed-off-by: Kyle Sayers <[email protected]>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <[email protected]>

* rename test file

Signed-off-by: Kyle Sayers <[email protected]>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cover all 3.9-3.12 in commit testing (#864)

Co-authored-by: dhuangnm <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Add marlin-24 recipe/configs for e2e testing (#866)

* add marlin-24 recipe/configs for e2e testing

* update

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] onload during sparsity calculation (#862)

* onload during sparsity calculation

* fix sparsity

---------

Co-authored-by: Dipika <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Fix HFTrainer overloads (#869)

* add missing arguments

Signed-off-by: Kyle Sayers <[email protected]>

* names

Signed-off-by: Kyle Sayers <[email protected]>

* style

Signed-off-by: Kyle Sayers <[email protected]>

* named args all around

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Support Model Offloading Tied Tensors Patch (#872)

* update parameter of offloaded modules

Signed-off-by: Kyle Sayers <[email protected]>

* in place function

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>

* add advice about dealing with non-invertable hessians (#875)

Signed-off-by: Kyle Sayers <[email protected]>

* seed commit workflow (#877)

* seed commit workflow

Signed-off-by: andy-neuma <[email protected]>

* tickle

Signed-off-by: andy-neuma <[email protected]>

* let's give it a try

Signed-off-by: andy-neuma <[email protected]>

* whitespace

Signed-off-by: andy-neuma <[email protected]>

* delete unneeded workflow

Signed-off-by: andy-neuma <[email protected]>

* adjust trigger

Signed-off-by: andy-neuma <[email protected]>

---------

Signed-off-by: andy-neuma <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837)

* update functioon

* wip

* clean-up; fix imports

* clean-up

* more clean-up

* bug fix

* update for kvcache

* get kv_cache to work

* docstring

* fix comment

* fix condition for dynamic

* update

* update tests

* add observer tests

* add flake8 skip

* apply updated mse fixes

* fix import

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <[email protected]>

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <[email protected]>

* PR comments

* clean-up

* move hook check to observer call

* update

* separate out calibration step

---------

Co-authored-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* WIP, observer

Signed-off-by: Kyle Sayers <[email protected]>

* use minmax observer

Signed-off-by: Kyle Sayers <[email protected]>

* Bugfix get observer from name (#883)

Signed-off-by: Rahul Tuli <[email protected]>

* BugFix: Fix Sparsity Reload Testing (#882)

* fix

* fix remaining test cases

* add comments

* fix

Signed-off-by: Kyle Sayers <[email protected]>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <[email protected]>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <[email protected]>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <[email protected]>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* use user-specified observer

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: andy-neuma <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: Andy Linfoot <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Co-authored-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
kylesayrs added a commit that referenced this pull request Nov 19, 2024
Signed-off-by: Kyle Sayers <[email protected]>
mgoin added a commit that referenced this pull request Nov 19, 2024
* set targets default earlier, remove QuantizationScheme.default_scheme

Signed-off-by: Kyle Sayers <[email protected]>

* clearer warning

Signed-off-by: Kyle Sayers <[email protected]>

* fix typo

Signed-off-by: Kyle Sayers <[email protected]>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <[email protected]>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* update docstring, use default factory for mutable default

Signed-off-by: Kyle Sayers <[email protected]>

* use Linear default

Signed-off-by: Kyle Sayers <[email protected]>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <[email protected]>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <[email protected]>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <[email protected]>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* update accelerate version (#899)

Signed-off-by: Kyle Sayers <[email protected]>

* [GPTQ] Iterative Parameter Updating (#863)

* Implement iterative parameter updating

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Use weight parameter of linear layer (#836)

* use weight parameter of linear layer

* add weight attribute check

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Rename files to remove colons (#846)

* rename files to remove colons

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <[email protected]>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <[email protected]>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <[email protected]>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <[email protected]>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <[email protected]>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* 2of4

Signed-off-by: Kyle Sayers <[email protected]>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <[email protected]>

* rename test file

Signed-off-by: Kyle Sayers <[email protected]>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cover all 3.9-3.12 in commit testing (#864)

Co-authored-by: dhuangnm <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Add marlin-24 recipe/configs for e2e testing (#866)

* add marlin-24 recipe/configs for e2e testing

* update

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] onload during sparsity calculation (#862)

* onload during sparsity calculation

* fix sparsity

---------

Co-authored-by: Dipika <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Fix HFTrainer overloads (#869)

* add missing arguments

Signed-off-by: Kyle Sayers <[email protected]>

* names

Signed-off-by: Kyle Sayers <[email protected]>

* style

Signed-off-by: Kyle Sayers <[email protected]>

* named args all around

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Support Model Offloading Tied Tensors Patch (#872)

* update parameter of offloaded modules

Signed-off-by: Kyle Sayers <[email protected]>

* in place function

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>

* add advice about dealing with non-invertable hessians (#875)

Signed-off-by: Kyle Sayers <[email protected]>

* seed commit workflow (#877)

* seed commit workflow

Signed-off-by: andy-neuma <[email protected]>

* tickle

Signed-off-by: andy-neuma <[email protected]>

* let's give it a try

Signed-off-by: andy-neuma <[email protected]>

* whitespace

Signed-off-by: andy-neuma <[email protected]>

* delete unneeded workflow

Signed-off-by: andy-neuma <[email protected]>

* adjust trigger

Signed-off-by: andy-neuma <[email protected]>

---------

Signed-off-by: andy-neuma <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837)

* update functioon

* wip

* clean-up; fix imports

* clean-up

* more clean-up

* bug fix

* update for kvcache

* get kv_cache to work

* docstring

* fix comment

* fix condition for dynamic

* update

* update tests

* add observer tests

* add flake8 skip

* apply updated mse fixes

* fix import

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <[email protected]>

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <[email protected]>

* PR comments

* clean-up

* move hook check to observer call

* update

* separate out calibration step

---------

Co-authored-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* WIP, observer

Signed-off-by: Kyle Sayers <[email protected]>

* use minmax observer

Signed-off-by: Kyle Sayers <[email protected]>

* Bugfix get observer from name (#883)

Signed-off-by: Rahul Tuli <[email protected]>

* BugFix: Fix Sparsity Reload Testing (#882)

* fix

* fix remaining test cases

* add comments

* fix

Signed-off-by: Kyle Sayers <[email protected]>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <[email protected]>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <[email protected]>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <[email protected]>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* use user-specified observer

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: andy-neuma <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: Andy Linfoot <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Co-authored-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Small fixes for release (#901)

* fix device map

* expose one gpu for finetune; update to use a better moodel and show generation for completeness

* more fixes

* typo fix

* dont just run unit tests

Signed-off-by: Kyle Sayers <[email protected]>

* use smaller portion of dataset (#902)

Signed-off-by: Kyle Sayers <[email protected]>

* Update example to not fail hessian inversion (#904)

* update

Signed-off-by: Dipika <[email protected]>

* quality

---------

Signed-off-by: Dipika <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* bump version (#907)

Signed-off-by: Dipika <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* add default mappings (#906)

Signed-off-by: Kyle Sayers <[email protected]>

* [SparseAutoModelForCausalLM Deprecation] Feature change (#881)

* src and tests updates

* save model if output_dir is provided

* save model if provided as a string

* typo

* save if model was provided as a string or custom output_dir was set

* comments

* save tokenizer also if model passed as a string or custom outputdir provided

* revert to True

* merge main

* merge main

* fix transformers tests

* Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py

Co-authored-by: Kyle Sayers <[email protected]>

* lint:

* fix bug

* fix bug

* comments

* comments

* fix saving bug on example script and comments

* fix test failure

* comments

* comments

* comments

* lint

* fix test_quantization.py

* fix bugs

* revert to default

* revert to default

* draft

* fix test

* logging output fix

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* correct typo (#888)

Signed-off-by: Kyle Sayers <[email protected]>

* use default factory, since default does not trigger field validator

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Domenic Barbuzzi <[email protected]>
Signed-off-by: andy-neuma <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Dipika <[email protected]>
Co-authored-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: Andy Linfoot <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Co-authored-by: George <[email protected]>
kylesayrs added a commit that referenced this pull request Nov 21, 2024
* rename files to remove colons

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <[email protected]>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <[email protected]>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <[email protected]>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <[email protected]>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <[email protected]>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* 2of4

Signed-off-by: Kyle Sayers <[email protected]>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <[email protected]>

* rename test file

Signed-off-by: Kyle Sayers <[email protected]>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
kylesayrs added a commit that referenced this pull request Nov 21, 2024
* Implement iterative parameter updating

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Use weight parameter of linear layer (#836)

* use weight parameter of linear layer

* add weight attribute check

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Rename files to remove colons (#846)

* rename files to remove colons

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <[email protected]>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <[email protected]>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <[email protected]>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <[email protected]>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <[email protected]>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* 2of4

Signed-off-by: Kyle Sayers <[email protected]>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <[email protected]>

* rename test file

Signed-off-by: Kyle Sayers <[email protected]>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cover all 3.9-3.12 in commit testing (#864)

Co-authored-by: dhuangnm <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Add marlin-24 recipe/configs for e2e testing (#866)

* add marlin-24 recipe/configs for e2e testing

* update

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] onload during sparsity calculation (#862)

* onload during sparsity calculation

* fix sparsity

---------

Co-authored-by: Dipika <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Fix HFTrainer overloads (#869)

* add missing arguments

Signed-off-by: Kyle Sayers <[email protected]>

* names

Signed-off-by: Kyle Sayers <[email protected]>

* style

Signed-off-by: Kyle Sayers <[email protected]>

* named args all around

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Support Model Offloading Tied Tensors Patch (#872)

* update parameter of offloaded modules

Signed-off-by: Kyle Sayers <[email protected]>

* in place function

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>

* add advice about dealing with non-invertable hessians (#875)

Signed-off-by: Kyle Sayers <[email protected]>

* seed commit workflow (#877)

* seed commit workflow

Signed-off-by: andy-neuma <[email protected]>

* tickle

Signed-off-by: andy-neuma <[email protected]>

* let's give it a try

Signed-off-by: andy-neuma <[email protected]>

* whitespace

Signed-off-by: andy-neuma <[email protected]>

* delete unneeded workflow

Signed-off-by: andy-neuma <[email protected]>

* adjust trigger

Signed-off-by: andy-neuma <[email protected]>

---------

Signed-off-by: andy-neuma <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837)

* update functioon

* wip

* clean-up; fix imports

* clean-up

* more clean-up

* bug fix

* update for kvcache

* get kv_cache to work

* docstring

* fix comment

* fix condition for dynamic

* update

* update tests

* add observer tests

* add flake8 skip

* apply updated mse fixes

* fix import

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <[email protected]>

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <[email protected]>

* PR comments

* clean-up

* move hook check to observer call

* update

* separate out calibration step

---------

Co-authored-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* WIP, observer

Signed-off-by: Kyle Sayers <[email protected]>

* use minmax observer

Signed-off-by: Kyle Sayers <[email protected]>

* Bugfix get observer from name (#883)

Signed-off-by: Rahul Tuli <[email protected]>

* BugFix: Fix Sparsity Reload Testing (#882)

* fix

* fix remaining test cases

* add comments

* fix

Signed-off-by: Kyle Sayers <[email protected]>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <[email protected]>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <[email protected]>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <[email protected]>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* use user-specified observer

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: andy-neuma <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: Andy Linfoot <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Co-authored-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
kylesayrs added a commit that referenced this pull request Nov 21, 2024
* set targets default earlier, remove QuantizationScheme.default_scheme

Signed-off-by: Kyle Sayers <[email protected]>

* clearer warning

Signed-off-by: Kyle Sayers <[email protected]>

* fix typo

Signed-off-by: Kyle Sayers <[email protected]>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <[email protected]>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* update docstring, use default factory for mutable default

Signed-off-by: Kyle Sayers <[email protected]>

* use Linear default

Signed-off-by: Kyle Sayers <[email protected]>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <[email protected]>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <[email protected]>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <[email protected]>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* update accelerate version (#899)

Signed-off-by: Kyle Sayers <[email protected]>

* [GPTQ] Iterative Parameter Updating (#863)

* Implement iterative parameter updating

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Use weight parameter of linear layer (#836)

* use weight parameter of linear layer

* add weight attribute check

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Rename files to remove colons (#846)

* rename files to remove colons

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <[email protected]>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <[email protected]>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <[email protected]>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <[email protected]>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <[email protected]>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* 2of4

Signed-off-by: Kyle Sayers <[email protected]>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <[email protected]>

* rename test file

Signed-off-by: Kyle Sayers <[email protected]>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cover all 3.9-3.12 in commit testing (#864)

Co-authored-by: dhuangnm <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Add marlin-24 recipe/configs for e2e testing (#866)

* add marlin-24 recipe/configs for e2e testing

* update

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] onload during sparsity calculation (#862)

* onload during sparsity calculation

* fix sparsity

---------

Co-authored-by: Dipika <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Fix HFTrainer overloads (#869)

* add missing arguments

Signed-off-by: Kyle Sayers <[email protected]>

* names

Signed-off-by: Kyle Sayers <[email protected]>

* style

Signed-off-by: Kyle Sayers <[email protected]>

* named args all around

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Support Model Offloading Tied Tensors Patch (#872)

* update parameter of offloaded modules

Signed-off-by: Kyle Sayers <[email protected]>

* in place function

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>

* add advice about dealing with non-invertable hessians (#875)

Signed-off-by: Kyle Sayers <[email protected]>

* seed commit workflow (#877)

* seed commit workflow

Signed-off-by: andy-neuma <[email protected]>

* tickle

Signed-off-by: andy-neuma <[email protected]>

* let's give it a try

Signed-off-by: andy-neuma <[email protected]>

* whitespace

Signed-off-by: andy-neuma <[email protected]>

* delete unneeded workflow

Signed-off-by: andy-neuma <[email protected]>

* adjust trigger

Signed-off-by: andy-neuma <[email protected]>

---------

Signed-off-by: andy-neuma <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837)

* update functioon

* wip

* clean-up; fix imports

* clean-up

* more clean-up

* bug fix

* update for kvcache

* get kv_cache to work

* docstring

* fix comment

* fix condition for dynamic

* update

* update tests

* add observer tests

* add flake8 skip

* apply updated mse fixes

* fix import

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <[email protected]>

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <[email protected]>

* PR comments

* clean-up

* move hook check to observer call

* update

* separate out calibration step

---------

Co-authored-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* WIP, observer

Signed-off-by: Kyle Sayers <[email protected]>

* use minmax observer

Signed-off-by: Kyle Sayers <[email protected]>

* Bugfix get observer from name (#883)

Signed-off-by: Rahul Tuli <[email protected]>

* BugFix: Fix Sparsity Reload Testing (#882)

* fix

* fix remaining test cases

* add comments

* fix

Signed-off-by: Kyle Sayers <[email protected]>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <[email protected]>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <[email protected]>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <[email protected]>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* use user-specified observer

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: andy-neuma <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: Andy Linfoot <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Co-authored-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Small fixes for release (#901)

* fix device map

* expose one gpu for finetune; update to use a better moodel and show generation for completeness

* more fixes

* typo fix

* dont just run unit tests

Signed-off-by: Kyle Sayers <[email protected]>

* use smaller portion of dataset (#902)

Signed-off-by: Kyle Sayers <[email protected]>

* Update example to not fail hessian inversion (#904)

* update

Signed-off-by: Dipika <[email protected]>

* quality

---------

Signed-off-by: Dipika <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* bump version (#907)

Signed-off-by: Dipika <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* add default mappings (#906)

Signed-off-by: Kyle Sayers <[email protected]>

* [SparseAutoModelForCausalLM Deprecation] Feature change (#881)

* src and tests updates

* save model if output_dir is provided

* save model if provided as a string

* typo

* save if model was provided as a string or custom output_dir was set

* comments

* save tokenizer also if model passed as a string or custom outputdir provided

* revert to True

* merge main

* merge main

* fix transformers tests

* Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py

Co-authored-by: Kyle Sayers <[email protected]>

* lint:

* fix bug

* fix bug

* comments

* comments

* fix saving bug on example script and comments

* fix test failure

* comments

* comments

* comments

* lint

* fix test_quantization.py

* fix bugs

* revert to default

* revert to default

* draft

* fix test

* logging output fix

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* correct typo (#888)

Signed-off-by: Kyle Sayers <[email protected]>

* use default factory, since default does not trigger field validator

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Domenic Barbuzzi <[email protected]>
Signed-off-by: andy-neuma <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Dipika <[email protected]>
Co-authored-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: Andy Linfoot <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Co-authored-by: George <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
kylesayrs added a commit that referenced this pull request Nov 21, 2024
Signed-off-by: Kyle Sayers <[email protected]>
kylesayrs added a commit that referenced this pull request Nov 21, 2024
* rename files to remove colons

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <[email protected]>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <[email protected]>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <[email protected]>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <[email protected]>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <[email protected]>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* 2of4

Signed-off-by: Kyle Sayers <[email protected]>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <[email protected]>

* rename test file

Signed-off-by: Kyle Sayers <[email protected]>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
kylesayrs added a commit that referenced this pull request Nov 21, 2024
* Implement iterative parameter updating

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Use weight parameter of linear layer (#836)

* use weight parameter of linear layer

* add weight attribute check

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Rename files to remove colons (#846)

* rename files to remove colons

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <[email protected]>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <[email protected]>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <[email protected]>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <[email protected]>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <[email protected]>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* 2of4

Signed-off-by: Kyle Sayers <[email protected]>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <[email protected]>

* rename test file

Signed-off-by: Kyle Sayers <[email protected]>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cover all 3.9-3.12 in commit testing (#864)

Co-authored-by: dhuangnm <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Add marlin-24 recipe/configs for e2e testing (#866)

* add marlin-24 recipe/configs for e2e testing

* update

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] onload during sparsity calculation (#862)

* onload during sparsity calculation

* fix sparsity

---------

Co-authored-by: Dipika <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Fix HFTrainer overloads (#869)

* add missing arguments

Signed-off-by: Kyle Sayers <[email protected]>

* names

Signed-off-by: Kyle Sayers <[email protected]>

* style

Signed-off-by: Kyle Sayers <[email protected]>

* named args all around

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Support Model Offloading Tied Tensors Patch (#872)

* update parameter of offloaded modules

Signed-off-by: Kyle Sayers <[email protected]>

* in place function

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>

* add advice about dealing with non-invertable hessians (#875)

Signed-off-by: Kyle Sayers <[email protected]>

* seed commit workflow (#877)

* seed commit workflow

Signed-off-by: andy-neuma <[email protected]>

* tickle

Signed-off-by: andy-neuma <[email protected]>

* let's give it a try

Signed-off-by: andy-neuma <[email protected]>

* whitespace

Signed-off-by: andy-neuma <[email protected]>

* delete unneeded workflow

Signed-off-by: andy-neuma <[email protected]>

* adjust trigger

Signed-off-by: andy-neuma <[email protected]>

---------

Signed-off-by: andy-neuma <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837)

* update functioon

* wip

* clean-up; fix imports

* clean-up

* more clean-up

* bug fix

* update for kvcache

* get kv_cache to work

* docstring

* fix comment

* fix condition for dynamic

* update

* update tests

* add observer tests

* add flake8 skip

* apply updated mse fixes

* fix import

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <[email protected]>

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <[email protected]>

* PR comments

* clean-up

* move hook check to observer call

* update

* separate out calibration step

---------

Co-authored-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* WIP, observer

Signed-off-by: Kyle Sayers <[email protected]>

* use minmax observer

Signed-off-by: Kyle Sayers <[email protected]>

* Bugfix get observer from name (#883)

Signed-off-by: Rahul Tuli <[email protected]>

* BugFix: Fix Sparsity Reload Testing (#882)

* fix

* fix remaining test cases

* add comments

* fix

Signed-off-by: Kyle Sayers <[email protected]>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <[email protected]>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <[email protected]>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <[email protected]>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* use user-specified observer

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: andy-neuma <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: Andy Linfoot <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Co-authored-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
dsikka added a commit that referenced this pull request Nov 25, 2024
* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* no cache context

Signed-off-by: Kyle Sayers <[email protected]>

* support mllamaconfig

Signed-off-by: Kyle Sayers <[email protected]>

* fix typo

Signed-off-by: Kyle Sayers <[email protected]>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <[email protected]>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <[email protected]>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <[email protected]>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <[email protected]>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <[email protected]>

* add docstring

Signed-off-by: Kyle Sayers <[email protected]>

* make docstring runnable

Signed-off-by: Kyle Sayers <[email protected]>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <[email protected]>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Use weight parameter of linear layer (#836)

* use weight parameter of linear layer

* add weight attribute check

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Rename files to remove colons (#846)

* rename files to remove colons

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <[email protected]>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <[email protected]>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <[email protected]>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <[email protected]>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <[email protected]>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* 2of4

Signed-off-by: Kyle Sayers <[email protected]>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <[email protected]>

* rename test file

Signed-off-by: Kyle Sayers <[email protected]>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cover all 3.9-3.12 in commit testing (#864)

Co-authored-by: dhuangnm <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Add marlin-24 recipe/configs for e2e testing (#866)

* add marlin-24 recipe/configs for e2e testing

* update

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] onload during sparsity calculation (#862)

* onload during sparsity calculation

* fix sparsity

---------

Co-authored-by: Dipika <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Fix HFTrainer overloads (#869)

* add missing arguments

Signed-off-by: Kyle Sayers <[email protected]>

* names

Signed-off-by: Kyle Sayers <[email protected]>

* style

Signed-off-by: Kyle Sayers <[email protected]>

* named args all around

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Support Model Offloading Tied Tensors Patch (#872)

* update parameter of offloaded modules

Signed-off-by: Kyle Sayers <[email protected]>

* in place function

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>

* add advice about dealing with non-invertable hessians (#875)

Signed-off-by: Kyle Sayers <[email protected]>

* seed commit workflow (#877)

* seed commit workflow

Signed-off-by: andy-neuma <[email protected]>

* tickle

Signed-off-by: andy-neuma <[email protected]>

* let's give it a try

Signed-off-by: andy-neuma <[email protected]>

* whitespace

Signed-off-by: andy-neuma <[email protected]>

* delete unneeded workflow

Signed-off-by: andy-neuma <[email protected]>

* adjust trigger

Signed-off-by: andy-neuma <[email protected]>

---------

Signed-off-by: andy-neuma <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837)

* update functioon

* wip

* clean-up; fix imports

* clean-up

* more clean-up

* bug fix

* update for kvcache

* get kv_cache to work

* docstring

* fix comment

* fix condition for dynamic

* update

* update tests

* add observer tests

* add flake8 skip

* apply updated mse fixes

* fix import

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <[email protected]>

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <[email protected]>

* PR comments

* clean-up

* move hook check to observer call

* update

* separate out calibration step

---------

Co-authored-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Bugfix get observer from name (#883)

Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* BugFix: Fix Sparsity Reload Testing (#882)

* fix

* fix remaining test cases

* add comments

* fix

Signed-off-by: Kyle Sayers <[email protected]>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <[email protected]>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <[email protected]>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <[email protected]>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* update accelerate version (#899)

Signed-off-by: Kyle Sayers <[email protected]>

* [GPTQ] Iterative Parameter Updating (#863)

* Implement iterative parameter updating

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Use weight parameter of linear layer (#836)

* use weight parameter of linear layer

* add weight attribute check

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Rename files to remove colons (#846)

* rename files to remove colons

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Workaround tied tensors bug (#659)

* load offload state dict

* add test

* remove merge duplication

* prepare to fix tie_word_embeddings

* add full tests

* patch second bug

* comment out failing tests, point to next pr

* link to issue

* accomodate offloaded models in test

* add back passing test

* WIP

* add error if not in expected list

* apply style

* update passing failing list

* add shared tensors tests

* clean up

* add comment with link

* make failing tests a todo

* Remove failing tests

* explicitly set safe_serialization

* separate out gpu tests, apply style

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* only untie word embeddings (#839)

Signed-off-by: Kyle Sayers <[email protected]>

* check for config hidden size (#840)

Signed-off-by: Kyle Sayers <[email protected]>

* Use float32 for Hessian dtype (#847)

* use float32 for hessian dtype

* explicitly set inp dtype as well

* float precision for obcq hessian

Signed-off-by: Kyle Sayers <[email protected]>

* GPTQ: Depreciate non-sequential update option (#762)

* remove from gptq, apply style

* remove instances of sequential_update argument in GPTQ tests

* update examples

* update example tests

* documentation, remove from example

* apply style

* revert back to auto type

* apply style

---------

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Typehint nits (#826)

Signed-off-by: Kyle Sayers <[email protected]>

* [ DOC ] Remove version restrictions in W8A8 exmaple (#849)

The latest compressored-tensor 0.8.0 removed some API,
https://github.com/neuralmagic/compressed-tensors/pull/156/files
If installed the older llmcompressor from pip, it would throw the
error like:
```
ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization'
```

Signed-off-by: Kyle Sayers <[email protected]>

* Fix inconsistence (#80)

Use group strategy with 128 group size instead of channel

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* 2of4

Signed-off-by: Kyle Sayers <[email protected]>

* revert change to unrelated example

Signed-off-by: Kyle Sayers <[email protected]>

* rename test file

Signed-off-by: Kyle Sayers <[email protected]>

* fix fwd func call (#845)

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cover all 3.9-3.12 in commit testing (#864)

Co-authored-by: dhuangnm <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Add marlin-24 recipe/configs for e2e testing (#866)

* add marlin-24 recipe/configs for e2e testing

* update

Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] onload during sparsity calculation (#862)

* onload during sparsity calculation

* fix sparsity

---------

Co-authored-by: Dipika <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Fix HFTrainer overloads (#869)

* add missing arguments

Signed-off-by: Kyle Sayers <[email protected]>

* names

Signed-off-by: Kyle Sayers <[email protected]>

* style

Signed-off-by: Kyle Sayers <[email protected]>

* named args all around

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Support Model Offloading Tied Tensors Patch (#872)

* update parameter of offloaded modules

Signed-off-by: Kyle Sayers <[email protected]>

* in place function

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>

* add advice about dealing with non-invertable hessians (#875)

Signed-off-by: Kyle Sayers <[email protected]>

* seed commit workflow (#877)

* seed commit workflow

Signed-off-by: andy-neuma <[email protected]>

* tickle

Signed-off-by: andy-neuma <[email protected]>

* let's give it a try

Signed-off-by: andy-neuma <[email protected]>

* whitespace

Signed-off-by: andy-neuma <[email protected]>

* delete unneeded workflow

Signed-off-by: andy-neuma <[email protected]>

* adjust trigger

Signed-off-by: andy-neuma <[email protected]>

---------

Signed-off-by: andy-neuma <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837)

* update functioon

* wip

* clean-up; fix imports

* clean-up

* more clean-up

* bug fix

* update for kvcache

* get kv_cache to work

* docstring

* fix comment

* fix condition for dynamic

* update

* update tests

* add observer tests

* add flake8 skip

* apply updated mse fixes

* fix import

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <[email protected]>

* Update src/llmcompressor/modifiers/quantization/calibration.py

Co-authored-by: Kyle Sayers <[email protected]>

* PR comments

* clean-up

* move hook check to observer call

* update

* separate out calibration step

---------

Co-authored-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* WIP, observer

Signed-off-by: Kyle Sayers <[email protected]>

* use minmax observer

Signed-off-by: Kyle Sayers <[email protected]>

* Bugfix get observer from name (#883)

Signed-off-by: Rahul Tuli <[email protected]>

* BugFix: Fix Sparsity Reload Testing (#882)

* fix

* fix remaining test cases

* add comments

* fix

Signed-off-by: Kyle Sayers <[email protected]>

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Kyle Sayers <[email protected]>

* Move config["testconfig_path"] assignment (#895)

* Use custom unique test names for e2e tests (#892)

* Include `testconfig_path` in parsed config data

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use custom unique names for e2e tests

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Revert "Use custom unique test names for e2e tests (#892)" (#893)

This reverts commit 10facf2.

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Move config["testconfig_path"] assignment

Signed-off-by: Domenic Barbuzzi <[email protected]>

* Use a function name generator for e2e test names

Signed-off-by: Domenic Barbuzzi <[email protected]>

---------

Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* cap accelerate version to avoid bug (#897)

Signed-off-by: Kyle Sayers <[email protected]>

* Fix observing offloaded weight (#896)

* load weight within onloading

Signed-off-by: Kyle Sayers <[email protected]>

* remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Update image in README.md (#861)

Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* use user-specified observer

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: andy-neuma <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: Andy Linfoot <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Co-authored-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* Small fixes for release (#901)

* fix device map

* expose one gpu for finetune; update to use a better moodel and show generation for completeness

* more fixes

* typo fix

* dont just run unit tests

Signed-off-by: Kyle Sayers <[email protected]>

* use smaller portion of dataset (#902)

Signed-off-by: Kyle Sayers <[email protected]>

* Update example to not fail hessian inversion (#904)

* update

Signed-off-by: Dipika <[email protected]>

* quality

---------

Signed-off-by: Dipika <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* bump version (#907)

Signed-off-by: Dipika <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* add default mappings (#906)

Signed-off-by: Kyle Sayers <[email protected]>

* [SparseAutoModelForCausalLM Deprecation] Feature change (#881)

* src and tests updates

* save model if output_dir is provided

* save model if provided as a string

* typo

* save if model was provided as a string or custom output_dir was set

* comments

* save tokenizer also if model passed as a string or custom outputdir provided

* revert to True

* merge main

* merge main

* fix transformers tests

* Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py

Co-authored-by: Kyle Sayers <[email protected]>

* lint:

* fix bug

* fix bug

* comments

* comments

* fix saving bug on example script and comments

* fix test failure

* comments

* comments

* comments

* lint

* fix test_quantization.py

* fix bugs

* revert to default

* revert to default

* draft

* fix test

* logging output fix

---------

Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>

* correct typo (#888)

Signed-off-by: Kyle Sayers <[email protected]>

* print config for better debugging

Signed-off-by: Kyle Sayers <[email protected]>

---------

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: andy-neuma <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Domenic Barbuzzi <[email protected]>
Signed-off-by: Dipika <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Jincheng Miao <[email protected]>
Co-authored-by: 黄石 <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: Andy Linfoot <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
Co-authored-by: Rahul Tuli <[email protected]>
Co-authored-by: Domenic Barbuzzi <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: George <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants