[INTEGRATION] Add GPTQModel support into transformers + optimum + peft #729

jiqing-feng · 2024-12-03T07:32:21Z

Function Uptreaming

optimum

2064 <-- MERGED

transformers

35012 <-- PENDING

peft

2247 <-- PENDING

Tests

optimum

test_quantization
RUN_SLOW=1 pytest tests/gptq/test_quantization.py

cpu tests
cuda tests

transformers

test_gptq
RUN_SLOW=1 pytest tests/quantization/gptq/test_gptq.py

cpu tests
cuda tests

peft

PeftGPTQGPUTests
pytest tests/test_gpu_examples.py::PeftGPTQTests and pytest tests/test_common_gpu.py::PeftCommonTests::test_lora_gptq_quantization_from_pretrained_safetensors

cpu tests
cuda tests

I suppose we don't need new unit tests for gptq in HF, just need to pass all gptq tests with gptqmodel lib. Please help to confirm it.
cc @Qubitium @SunMarc

The text was updated successfully, but these errors were encountered:

SunMarc · 2024-12-03T12:34:41Z

Sounds good to me @jiqing-feng ! Also, after discussing a bit with other members at HF, I think it is better for now to put any mention of deprecating AutoGPTQ for now. If both libraries are installed, we can use GPTQModel library and put a clear warning sign. Thanks for working on this !

Qubitium · 2024-12-03T14:07:45Z

@jiqing-feng Internal GPTQModel code refractor to support hf/optimum passing internal tests.

Transformer/Optimum PRs needs the following merges:
https://github.com/jiqing-feng/transformers/pull/2/files
https://github.com/jiqing-feng/optimum/pull/2/files

@SunMarc We will do more testing tomorrow and let you know when everything is kosher so you can do review on proper code that is passing all tests.

Update: above 2xPR both Num.2 for transformer/optimum have been merged.

Qubitium · 2024-12-04T11:50:27Z

Status update. We have started testing the above Transformer/Optimum/Peft tests under Nvidia/cuda GPU using gptqmodel[main] + the following PRs:

https://github.com/jiqing-feng/transformers/pull/3/files
https://github.com/jiqing-feng/optimum/pull/3/files

Qubitium · 2024-12-04T13:23:12Z

@jiqing-feng @SunMarc One issue we found is all the gptq tests in optimum are super flaky not only between transformer/torch version but also between gptqmodel vs auto-gptq. In gptqmodel we moved away from string compare and directly do eval harness benchmarks and check for regression floor value on a fixed benchmark. This is not going to be addressed in this PR but needs to addressed in future PRs.

https://github.com/jiqing-feng/transformers/pull/3/files#diff-5f4148e9e983fe6fb9bd7a8eb1e3e8fe65971cabb1cf2d4ce9b18885c07b7d44

Qubitium · 2024-12-04T14:56:13Z

Update: All 3 pending (not-yet ready) PRs we will are testing for @jiqing-feng PRs

Transformers: https://github.com/jiqing-feng/transformers/pull/3/files
Optimum: https://github.com/jiqing-feng/optimum/pull/3/files
Peft: https://github.com/jiqing-feng/peft/pull/1/files

Qubitium · 2024-12-04T16:49:04Z

Final update for today. We are trying to find min pain-point for fixing peft/gptqmodel non-ipex paths. There are solutions but none are great since only three kernels (torch/cuda/triton) actually supports accelerated + trainable forward.

But selecting right kernel in transformer load, then optimum quantize, followed finally by peft/training mode are very disjointed steps in transformer: there appears to be no auto-hook to register so gptqmodel internals can switch non trainable kernel into trainable kernel.

Can a nn.module actually register a hook to know its going into training mode beyond a boolean self.trainable state that is only checked in forward()?

We will test the best available method tomrrow. Ran out of time today.

Qubitium · 2024-12-05T03:57:50Z

But selecting right kernel in transformer load, then optimum quantize, followed finally by peft/training mode are very disjointed steps in transformer: there appears to be no auto-hook to register so gptqmodel internals can switch non trainable kernel into trainable kernel.

Can a nn.module actually register a hook to know its going into training mode beyond a boolean self.trainable state that is only checked in forward()?

Best-fit solution has been found. We are refactoring gptqmodel code at the moment so peft compat can be added cleanly.

Qubitium · 2024-12-05T12:16:46Z

Almost there. All code 99.8% ready for all 3 prs. Internal testing starting. @jiqing-feng Once all our tests pass, I will let you know to merge to your 3 PRS and before we can start final round of testing.

Tracking:

Transformers: https://github.com/jiqing-feng/transformers/pull/3/files
Optimum: https://github.com/jiqing-feng/optimum/pull/3/files
Peft: https://github.com/jiqing-feng/peft/pull/1/files

Qubitium · 2024-12-05T12:51:20Z

Almost there. All code 99.8% ready for all 3 prs. Internal testing starting. @jiqing-feng Once all our tests pass, I will let you know to merge to your 3 PRS and before we can start final round of testing.

Tracking:

Transformers: https://github.com/jiqing-feng/transformers/pull/3/files Optimum: https://github.com/jiqing-feng/optimum/pull/3/files Peft: https://github.com/jiqing-feng/peft/pull/1/files

@jiqing-feng All 3 prs passing all cpu/gpu tests internally here. There are some string output mismatches but that's random/variability on different cpu. Cpu tests may fail, tensors on cuda, when cpu tests are run in env where both cpu + cuda device are exposed despite device_map is set to cpu only. I think this is hf bug but we need to check.

Once these 3 prs are merged to the larger PRS, I will write a length explanation on some of the obvious and not so obvious small/large changes we require/pushed.

jiqing-feng · 2024-12-05T12:51:36Z

Update: All 3 pending (not-yet ready) PRs we will are testing for @jiqing-feng PRs

Transformers: https://github.com/jiqing-feng/transformers/pull/3/files Optimum: https://github.com/jiqing-feng/optimum/pull/3/files Peft: https://github.com/jiqing-feng/peft/pull/1/files

All merged, will check it again on CPU.

Qubitium · 2024-12-08T05:26:55Z

All tests passing. Both gptqmodel internal tests and transformer/optimum/peft tests passing for cpu/xpu/cuda.

Qubitium · 2024-12-11T00:51:32Z

GPTQModel v1.4.0 has been released. If there are any change required for the above upstream PRs, we will make changes and cut a new release.

jiqing-feng added the bug Something isn't working label Dec 3, 2024

Qubitium mentioned this issue Dec 3, 2024

Replace auto_gptq by gptqmodel in HuggingFace/Optimum #536

Closed

Qubitium changed the title ~~Integrate gptqmodel into HF~~ [INTEGRATION] Add GPTQModel support into transformers + optimum + peft Dec 5, 2024

jiqing-feng mentioned this issue Dec 10, 2024

Enable GPTQModel huggingface/optimum#2064

Merged

Qubitium mentioned this issue Dec 16, 2024

GPTQModel support vkola-lab/PodGPT#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[INTEGRATION] Add GPTQModel support into transformers + optimum + peft #729

[INTEGRATION] Add GPTQModel support into transformers + optimum + peft #729

jiqing-feng commented Dec 3, 2024 •

edited by Qubitium

Loading

SunMarc commented Dec 3, 2024

Qubitium commented Dec 3, 2024 •

edited

Loading

Qubitium commented Dec 4, 2024

Qubitium commented Dec 4, 2024 •

edited

Loading

Qubitium commented Dec 4, 2024

Qubitium commented Dec 4, 2024 •

edited

Loading

Qubitium commented Dec 5, 2024

Qubitium commented Dec 5, 2024

Qubitium commented Dec 5, 2024 •

edited

Loading

jiqing-feng commented Dec 5, 2024

Qubitium commented Dec 8, 2024

Qubitium commented Dec 11, 2024

[INTEGRATION] Add GPTQModel support into transformers + optimum + peft #729

[INTEGRATION] Add GPTQModel support into transformers + optimum + peft #729

Comments

jiqing-feng commented Dec 3, 2024 • edited by Qubitium Loading

Function Uptreaming

optimum

transformers

peft

Tests

optimum

transformers

peft

SunMarc commented Dec 3, 2024

Qubitium commented Dec 3, 2024 • edited Loading

Qubitium commented Dec 4, 2024

Qubitium commented Dec 4, 2024 • edited Loading

Qubitium commented Dec 4, 2024

Qubitium commented Dec 4, 2024 • edited Loading

Qubitium commented Dec 5, 2024

Qubitium commented Dec 5, 2024

Qubitium commented Dec 5, 2024 • edited Loading

jiqing-feng commented Dec 5, 2024

Qubitium commented Dec 8, 2024

Qubitium commented Dec 11, 2024

jiqing-feng commented Dec 3, 2024 •

edited by Qubitium

Loading

Qubitium commented Dec 3, 2024 •

edited

Loading

Qubitium commented Dec 4, 2024 •

edited

Loading

Qubitium commented Dec 4, 2024 •

edited

Loading

Qubitium commented Dec 5, 2024 •

edited

Loading