Typehint nits #826

kylesayrs · 2024-10-07T17:58:49Z

No description provided.

github-actions · 2024-10-07T17:59:02Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

src/llmcompressor/transformers/finetune/session_mixin.py

rahul-tuli · 2024-10-16T22:32:41Z

src/llmcompressor/transformers/finetune/text_generation.py

    model = model_args.model
    # Load tokenizer
    # distill TODO: support for different tokenizer for teacher?
    tokenizer = model_args.tokenizer

    if isinstance(model, str) or isinstance(model, PosixPath):
-        (teacher, model_path, model) = initialize_model_from_path(
+        (teacher, _model_path, model) = initialize_model_from_path(


model_path is unused. _model_path indicates that the variable is unused

Signed-off-by: Kyle Sayers <[email protected]>

* rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]>

* rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

* Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]>

* Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

Signed-off-by: Kyle Sayers <[email protected]>

* set targets default earlier, remove QuantizationScheme.default_scheme Signed-off-by: Kyle Sayers <[email protected]> * clearer warning Signed-off-by: Kyle Sayers <[email protected]> * fix typo Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * update docstring, use default factory for mutable default Signed-off-by: Kyle Sayers <[email protected]> * use Linear default Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * update accelerate version (#899) Signed-off-by: Kyle Sayers <[email protected]> * [GPTQ] Iterative Parameter Updating (#863) * Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Small fixes for release (#901) * fix device map * expose one gpu for finetune; update to use a better moodel and show generation for completeness * more fixes * typo fix * dont just run unit tests Signed-off-by: Kyle Sayers <[email protected]> * use smaller portion of dataset (#902) Signed-off-by: Kyle Sayers <[email protected]> * Update example to not fail hessian inversion (#904) * update Signed-off-by: Dipika <[email protected]> * quality --------- Signed-off-by: Dipika <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * bump version (#907) Signed-off-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * add default mappings (#906) Signed-off-by: Kyle Sayers <[email protected]> * [SparseAutoModelForCausalLM Deprecation] Feature change (#881) * src and tests updates * save model if output_dir is provided * save model if provided as a string * typo * save if model was provided as a string or custom output_dir was set * comments * save tokenizer also if model passed as a string or custom outputdir provided * revert to True * merge main * merge main * fix transformers tests * Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Co-authored-by: Kyle Sayers <[email protected]> * lint: * fix bug * fix bug * comments * comments * fix saving bug on example script and comments * fix test failure * comments * comments * comments * lint * fix test_quantization.py * fix bugs * revert to default * revert to default * draft * fix test * logging output fix --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * correct typo (#888) Signed-off-by: Kyle Sayers <[email protected]> * use default factory, since default does not trigger field validator Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Dipika <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: George <[email protected]>

* rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

* Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

* set targets default earlier, remove QuantizationScheme.default_scheme Signed-off-by: Kyle Sayers <[email protected]> * clearer warning Signed-off-by: Kyle Sayers <[email protected]> * fix typo Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * update docstring, use default factory for mutable default Signed-off-by: Kyle Sayers <[email protected]> * use Linear default Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * update accelerate version (#899) Signed-off-by: Kyle Sayers <[email protected]> * [GPTQ] Iterative Parameter Updating (#863) * Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Small fixes for release (#901) * fix device map * expose one gpu for finetune; update to use a better moodel and show generation for completeness * more fixes * typo fix * dont just run unit tests Signed-off-by: Kyle Sayers <[email protected]> * use smaller portion of dataset (#902) Signed-off-by: Kyle Sayers <[email protected]> * Update example to not fail hessian inversion (#904) * update Signed-off-by: Dipika <[email protected]> * quality --------- Signed-off-by: Dipika <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * bump version (#907) Signed-off-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * add default mappings (#906) Signed-off-by: Kyle Sayers <[email protected]> * [SparseAutoModelForCausalLM Deprecation] Feature change (#881) * src and tests updates * save model if output_dir is provided * save model if provided as a string * typo * save if model was provided as a string or custom output_dir was set * comments * save tokenizer also if model passed as a string or custom outputdir provided * revert to True * merge main * merge main * fix transformers tests * Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Co-authored-by: Kyle Sayers <[email protected]> * lint: * fix bug * fix bug * comments * comments * fix saving bug on example script and comments * fix test failure * comments * comments * comments * lint * fix test_quantization.py * fix bugs * revert to default * revert to default * draft * fix test * logging output fix --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * correct typo (#888) Signed-off-by: Kyle Sayers <[email protected]> * use default factory, since default does not trigger field validator Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Dipika <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: George <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

Signed-off-by: Kyle Sayers <[email protected]>

* rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

* Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

* [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * no cache context Signed-off-by: Kyle Sayers <[email protected]> * support mllamaconfig Signed-off-by: Kyle Sayers <[email protected]> * fix typo Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * add docstring Signed-off-by: Kyle Sayers <[email protected]> * make docstring runnable Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * update accelerate version (#899) Signed-off-by: Kyle Sayers <[email protected]> * [GPTQ] Iterative Parameter Updating (#863) * Implement iterative parameter updating Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <[email protected]> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <[email protected]> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <[email protected]> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Typehint nits (#826) Signed-off-by: Kyle Sayers <[email protected]> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <[email protected]> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * 2of4 Signed-off-by: Kyle Sayers <[email protected]> * revert change to unrelated example Signed-off-by: Kyle Sayers <[email protected]> * rename test file Signed-off-by: Kyle Sayers <[email protected]> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <[email protected]> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <[email protected]> * names Signed-off-by: Kyle Sayers <[email protected]> * style Signed-off-by: Kyle Sayers <[email protected]> * named args all around Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <[email protected]> * in place function Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <[email protected]> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <[email protected]> * tickle Signed-off-by: andy-neuma <[email protected]> * let's give it a try Signed-off-by: andy-neuma <[email protected]> * whitespace Signed-off-by: andy-neuma <[email protected]> * delete unneeded workflow Signed-off-by: andy-neuma <[email protected]> * adjust trigger Signed-off-by: andy-neuma <[email protected]> --------- Signed-off-by: andy-neuma <[email protected]> Co-authored-by: andy-neuma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <[email protected]> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * WIP, observer Signed-off-by: Kyle Sayers <[email protected]> * use minmax observer Signed-off-by: Kyle Sayers <[email protected]> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <[email protected]> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <[email protected]> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <[email protected]> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <[email protected]> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <[email protected]> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <[email protected]> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <[email protected]> --------- Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <[email protected]> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <[email protected]> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * use user-specified observer Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * Small fixes for release (#901) * fix device map * expose one gpu for finetune; update to use a better moodel and show generation for completeness * more fixes * typo fix * dont just run unit tests Signed-off-by: Kyle Sayers <[email protected]> * use smaller portion of dataset (#902) Signed-off-by: Kyle Sayers <[email protected]> * Update example to not fail hessian inversion (#904) * update Signed-off-by: Dipika <[email protected]> * quality --------- Signed-off-by: Dipika <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * bump version (#907) Signed-off-by: Dipika <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * add default mappings (#906) Signed-off-by: Kyle Sayers <[email protected]> * [SparseAutoModelForCausalLM Deprecation] Feature change (#881) * src and tests updates * save model if output_dir is provided * save model if provided as a string * typo * save if model was provided as a string or custom output_dir was set * comments * save tokenizer also if model passed as a string or custom outputdir provided * revert to True * merge main * merge main * fix transformers tests * Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Co-authored-by: Kyle Sayers <[email protected]> * lint: * fix bug * fix bug * comments * comments * fix saving bug on example script and comments * fix test failure * comments * comments * comments * lint * fix test_quantization.py * fix bugs * revert to default * revert to default * draft * fix test * logging output fix --------- Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Signed-off-by: Kyle Sayers <[email protected]> * correct typo (#888) Signed-off-by: Kyle Sayers <[email protected]> * print config for better debugging Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: andy-neuma <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Domenic Barbuzzi <[email protected]> Signed-off-by: Dipika <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: Dipika Sikka <[email protected]> Co-authored-by: Jincheng Miao <[email protected]> Co-authored-by: 黄石 <[email protected]> Co-authored-by: Kyle Sayers <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: Andy Linfoot <[email protected]> Co-authored-by: andy-neuma <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Domenic Barbuzzi <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: George <[email protected]>

fix typehint import

89929c2

kylesayrs added 3 commits October 7, 2024 18:10

typo

291faaa

fix typehint

2ca7073

type checking

5d26a75

kylesayrs changed the title ~~Fix typehint import~~ Typehint nits Oct 7, 2024

kylesayrs marked this pull request as ready for review October 7, 2024 19:27

kylesayrs self-assigned this Oct 7, 2024

kylesayrs added 7 commits October 8, 2024 01:31

fix typehint

da3939b

fix typo

a1195ca

unused variable

2ee1ab3

Merge branch 'main' into kylesayrs/fix-typehint

4e7f00c

Merge branch 'main' into kylesayrs/fix-typehint

2dc4b61

Merge branch 'main' into kylesayrs/fix-typehint

85eb9c3

Merge branch 'main' into kylesayrs/fix-typehint

d09c90d

mgoin approved these changes Oct 16, 2024

View reviewed changes

rahul-tuli approved these changes Oct 16, 2024

View reviewed changes

mgoin merged commit 3eacbb3 into main Oct 16, 2024
5 of 6 checks passed

mgoin deleted the kylesayrs/fix-typehint branch October 16, 2024 23:00

kylesayrs added a commit that referenced this pull request Oct 23, 2024

Typehint nits (#826)

28ded56

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added a commit that referenced this pull request Nov 19, 2024

Typehint nits (#826)

b9bff49

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added a commit that referenced this pull request Nov 21, 2024

Typehint nits (#826)

b715b05

Signed-off-by: Kyle Sayers <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Typehint nits #826

Typehint nits #826

kylesayrs commented Oct 7, 2024 •

edited

Loading

github-actions bot commented Oct 7, 2024

rahul-tuli Oct 16, 2024

kylesayrs Oct 16, 2024

Typehint nits #826

Typehint nits #826

Conversation

kylesayrs commented Oct 7, 2024 • edited Loading

github-actions bot commented Oct 7, 2024

rahul-tuli Oct 16, 2024

Choose a reason for hiding this comment

kylesayrs Oct 16, 2024

Choose a reason for hiding this comment

kylesayrs commented Oct 7, 2024 •

edited

Loading