add_exllamav2 #1419

SunMarc · 2023-09-27T11:26:37Z

What does this PR do ?

This PR adds the possibility to choose exllamav2 kernels for GPTQ model This PR follows the integration of the kernels in auto-gptq. I've also added a test to check that we are able to load and infer using exllamav2 kernel. I will update the benchmark in a follow-up PR.

Merge after the release of auto-gptq ?

docs/source/llm_quantization/usage_guides/quantization.mdx

optimum/gptq/quantizer.py

tests/gptq/test_quantization.py

fxmarty · 2023-10-02T07:58:39Z

tests/gptq/test_quantization.py

+    def test_generate_quality(self):
+        # don't need to test
+        pass
+
+    def test_serialization(self):
+        # don't need to test
+        pass


Here, we quantize the model using the cuda-old kernel and save the model to later load it with exllamav2 for the test_exllama_serialization test. Since these tests will use the cuda-old kernel, we don't need to test them as we already do so in a previous test.

And how about generate_quality?

This is also tested in the GPTQTest class. The wording is confusing but test_exllama_serialization in GPTQTestExllamav2 does two things: test the loading quantized weights with exllamav2 kernels + test the inference correctness.

fxmarty

LGTM, that's great! As commented on transformers PR by somebody else, that's true that the inflation of disable args is not very scalable, that was probably a bad idea on my end to use that in AutoGPTQ.

docs/source/llm_quantization/usage_guides/quantization.mdx

fxmarty · 2023-10-17T08:38:08Z

tests/gptq/test_quantization.py

+    def test_generate_quality(self):
+        # don't need to test
+        pass
+
+    def test_serialization(self):
+        # don't need to test
+        pass


And how about generate_quality?

docs/source/llm_quantization/usage_guides/quantization.mdx

Co-authored-by: fxmarty <[email protected]>

This reverts commit aba7f46.

achew010 · 2024-04-04T03:50:22Z

Hi @SunMarc / @fxmarty,

I was running some QPeft experiments and it looks like optimum's GPTQ interface does not work with the exllama kernel for finetuning.
The loss diverges (as compared to using default cuda), See figure below.

I suspect this is likely due to the absence of the backward function in the exllama kernel. I'm wondering if you guys are aware of this behaviour?

add_exllamav2

5db2541

SunMarc requested review from fxmarty and younesbelkada September 27, 2023 11:26

SunMarc added 2 commits September 27, 2023 11:50

style

03441b8

fix doc

80d085e

SunMarc mentioned this pull request Sep 27, 2023

gptq_benchmark_update #1420

Merged

3 tasks

fxmarty reviewed Oct 2, 2023

View reviewed changes

SunMarc added 2 commits October 16, 2023 17:57

fix doc

d3b92e4

raise error

7f8962d

SunMarc requested a review from fxmarty October 16, 2023 18:20

fxmarty approved these changes Oct 17, 2023

View reviewed changes

SunMarc and others added 3 commits October 18, 2023 14:51

Update docs/source/llm_quantization/usage_guides/quantization.mdx

714fece

Co-authored-by: fxmarty <[email protected]>

update doc

d2bd4af

update min version of autogptq

a8cefa7

SunMarc merged commit aba7f46 into huggingface:main Oct 24, 2023
50 of 52 checks passed

SunMarc deleted the add_exllamav2 branch October 24, 2023 17:58

SunMarc restored the add_exllamav2 branch October 24, 2023 18:49

SunMarc added a commit that referenced this pull request Oct 24, 2023

Revert "add_exllamav2 (#1419)"

8bef362

This reverts commit aba7f46.

SunMarc mentioned this pull request Oct 26, 2023

Fix minimum auto-gptq version logic #1489

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add_exllamav2 #1419

add_exllamav2 #1419

SunMarc commented Sep 27, 2023 •

edited

Loading

fxmarty Oct 2, 2023

SunMarc Oct 16, 2023

fxmarty Oct 17, 2023

SunMarc Oct 18, 2023

fxmarty left a comment

fxmarty Oct 17, 2023

achew010 commented Apr 4, 2024 •

edited

Loading

add_exllamav2 #1419

add_exllamav2 #1419

Conversation

SunMarc commented Sep 27, 2023 • edited Loading

What does this PR do ?

fxmarty Oct 2, 2023

Choose a reason for hiding this comment

SunMarc Oct 16, 2023

Choose a reason for hiding this comment

fxmarty Oct 17, 2023

Choose a reason for hiding this comment

SunMarc Oct 18, 2023

Choose a reason for hiding this comment

fxmarty left a comment

Choose a reason for hiding this comment

fxmarty Oct 17, 2023

Choose a reason for hiding this comment

achew010 commented Apr 4, 2024 • edited Loading

SunMarc commented Sep 27, 2023 •

edited

Loading

achew010 commented Apr 4, 2024 •

edited

Loading