Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layers not skipped with ignore=[ "re:.*"] #91

Open
horheynm opened this issue Aug 15, 2024 · 2 comments
Open

Layers not skipped with ignore=[ "re:.*"] #91

horheynm opened this issue Aug 15, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@horheynm
Copy link
Collaborator

Describe the bug
Cosmetic issue.

Running the code std-out's

===== Compressing layer 23/40  =====
2024-08-15T15:22:59.526464+0000 | compress_module | INFO - Compressing model.layers.22.model.layers.22.self_attn.o_proj...
2024-08-15T15:23:00.110515+0000 | compress | INFO - time 0.51
2024-08-15T15:23:00.110713+0000 | compress | INFO - error 0.00

Expected behavior
exit the compress() function early - but GPTQ will still run. we do do need all the layers in the pipeline for the data to flow properly.

Environment
Include all relevant environment information:

  1. OS [e.g. Ubuntu 20.04]:
  2. Python version [e.g. 3.7]:
  3. LLM Compressor version or commit hash [e.g. 0.1.0, f7245c8]:
  4. ML framework version(s) [e.g. torch 2.3.1]:
  5. Other Python package versions [e.g. vLLM, compressed-tensors, numpy, ONNX]:
  6. Other relevant environment information [e.g. hardware, CUDA version]:

To Reproduce

recipe = [
    GPTQModifier(targets="Linear", scheme="W8A8", ignore=["lm_head", "re:.*"]),
]

using examples/big_models_with_accelerate/multi_gpu_int8.py.

Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.

Additional context
Add any other context about the problem here. Also include any relevant files.

@horheynm horheynm added the bug Something isn't working label Aug 15, 2024
@fengyang95
Copy link

Maybe you can pass the layers you want to quantize into the sequential_targets.

@markurtz
Copy link
Collaborator

@kylesayrs I believe this should be fixed with the work you're doing, can you confirm?

@kylesayrs kylesayrs self-assigned this Oct 21, 2024
markmc pushed a commit to markmc/llm-compressor that referenced this issue Nov 13, 2024
* add function to pack bits

* fix arg

* make 4bits the default

* update

* add support for int8 decompress; update function to take in name to scheme mapping

* update to test 8 bits; update kwargs

* fix print; update name

* update tests

* update arg

* update all other classes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants