Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INTEGRATION] Add GPTQModel support into transformers + optimum + peft #729

Open
6 tasks done
jiqing-feng opened this issue Dec 3, 2024 · 12 comments
Open
6 tasks done
Labels
bug Something isn't working

Comments

@jiqing-feng
Copy link
Contributor

jiqing-feng commented Dec 3, 2024

Function Uptreaming

optimum

2064 <-- MERGED

transformers

35012 <-- PENDING

peft

2247 <-- PENDING

Tests

optimum

test_quantization
RUN_SLOW=1 pytest tests/gptq/test_quantization.py

  • cpu tests
  • cuda tests

transformers

test_gptq
RUN_SLOW=1 pytest tests/quantization/gptq/test_gptq.py

  • cpu tests
  • cuda tests

peft

PeftGPTQGPUTests
pytest tests/test_gpu_examples.py::PeftGPTQTests and pytest tests/test_common_gpu.py::PeftCommonTests::test_lora_gptq_quantization_from_pretrained_safetensors

  • cpu tests
  • cuda tests

I suppose we don't need new unit tests for gptq in HF, just need to pass all gptq tests with gptqmodel lib. Please help to confirm it.
cc @Qubitium @SunMarc

@SunMarc
Copy link
Contributor

SunMarc commented Dec 3, 2024

Sounds good to me @jiqing-feng ! Also, after discussing a bit with other members at HF, I think it is better for now to put any mention of deprecating AutoGPTQ for now. If both libraries are installed, we can use GPTQModel library and put a clear warning sign. Thanks for working on this !

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 3, 2024

@jiqing-feng Internal GPTQModel code refractor to support hf/optimum passing internal tests.

Transformer/Optimum PRs needs the following merges:
https://github.com/jiqing-feng/transformers/pull/2/files
https://github.com/jiqing-feng/optimum/pull/2/files

@SunMarc We will do more testing tomorrow and let you know when everything is kosher so you can do review on proper code that is passing all tests.

Update: above 2xPR both Num.2 for transformer/optimum have been merged.

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 4, 2024

Status update. We have started testing the above Transformer/Optimum/Peft tests under Nvidia/cuda GPU using gptqmodel[main] + the following PRs:

https://github.com/jiqing-feng/transformers/pull/3/files
https://github.com/jiqing-feng/optimum/pull/3/files

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 4, 2024

@jiqing-feng @SunMarc One issue we found is all the gptq tests in optimum are super flaky not only between transformer/torch version but also between gptqmodel vs auto-gptq. In gptqmodel we moved away from string compare and directly do eval harness benchmarks and check for regression floor value on a fixed benchmark. This is not going to be addressed in this PR but needs to addressed in future PRs.

https://github.com/jiqing-feng/transformers/pull/3/files#diff-5f4148e9e983fe6fb9bd7a8eb1e3e8fe65971cabb1cf2d4ce9b18885c07b7d44

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 4, 2024

Update: All 3 pending (not-yet ready) PRs we will are testing for @jiqing-feng PRs

Transformers: https://github.com/jiqing-feng/transformers/pull/3/files
Optimum: https://github.com/jiqing-feng/optimum/pull/3/files
Peft: https://github.com/jiqing-feng/peft/pull/1/files

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 4, 2024

Final update for today. We are trying to find min pain-point for fixing peft/gptqmodel non-ipex paths. There are solutions but none are great since only three kernels (torch/cuda/triton) actually supports accelerated + trainable forward.

But selecting right kernel in transformer load, then optimum quantize, followed finally by peft/training mode are very disjointed steps in transformer: there appears to be no auto-hook to register so gptqmodel internals can switch non trainable kernel into trainable kernel.

Can a nn.module actually register a hook to know its going into training mode beyond a boolean self.trainable state that is only checked in forward()?

We will test the best available method tomrrow. Ran out of time today.

@Qubitium Qubitium changed the title Integrate gptqmodel into HF [INTEGRATION] Add GPTQModel support into transformers + optimum + peft Dec 5, 2024
@Qubitium
Copy link
Collaborator

Qubitium commented Dec 5, 2024

But selecting right kernel in transformer load, then optimum quantize, followed finally by peft/training mode are very disjointed steps in transformer: there appears to be no auto-hook to register so gptqmodel internals can switch non trainable kernel into trainable kernel.

Can a nn.module actually register a hook to know its going into training mode beyond a boolean self.trainable state that is only checked in forward()?

Best-fit solution has been found. We are refactoring gptqmodel code at the moment so peft compat can be added cleanly.

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 5, 2024

Almost there. All code 99.8% ready for all 3 prs. Internal testing starting. @jiqing-feng Once all our tests pass, I will let you know to merge to your 3 PRS and before we can start final round of testing.

Tracking:

Transformers: https://github.com/jiqing-feng/transformers/pull/3/files
Optimum: https://github.com/jiqing-feng/optimum/pull/3/files
Peft: https://github.com/jiqing-feng/peft/pull/1/files

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 5, 2024

Almost there. All code 99.8% ready for all 3 prs. Internal testing starting. @jiqing-feng Once all our tests pass, I will let you know to merge to your 3 PRS and before we can start final round of testing.

Tracking:

Transformers: https://github.com/jiqing-feng/transformers/pull/3/files Optimum: https://github.com/jiqing-feng/optimum/pull/3/files Peft: https://github.com/jiqing-feng/peft/pull/1/files

@jiqing-feng All 3 prs passing all cpu/gpu tests internally here. There are some string output mismatches but that's random/variability on different cpu. Cpu tests may fail, tensors on cuda, when cpu tests are run in env where both cpu + cuda device are exposed despite device_map is set to cpu only. I think this is hf bug but we need to check.

Once these 3 prs are merged to the larger PRS, I will write a length explanation on some of the obvious and not so obvious small/large changes we require/pushed.

@jiqing-feng
Copy link
Contributor Author

Update: All 3 pending (not-yet ready) PRs we will are testing for @jiqing-feng PRs

Transformers: https://github.com/jiqing-feng/transformers/pull/3/files Optimum: https://github.com/jiqing-feng/optimum/pull/3/files Peft: https://github.com/jiqing-feng/peft/pull/1/files

All merged, will check it again on CPU.

@Qubitium
Copy link
Collaborator

Qubitium commented Dec 8, 2024

All tests passing. Both gptqmodel internal tests and transformer/optimum/peft tests passing for cpu/xpu/cuda.

@Qubitium
Copy link
Collaborator

GPTQModel v1.4.0 has been released. If there are any change required for the above upstream PRs, we will make changes and cut a new release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants