[REQUEST] PerChannel Setting #707

Coco58323 · 2024-12-27T14:04:48Z

Problem

Current implementation has not supported Perchannel Quantization. Would consider add it into config?

Solution

Implements per-channel quantization parameters in conversion/qparams.py and optimizes GEMM computation through deferred scaling, where column-wise scaling factors are applied to the final accumulation results to minimize arithmetic operations and improve computational efficiency.

Alternatives

No response

Explanation

For higher than 4 bit, the accuracy is nearly lossless with per-channel quantization. It would be efficient for inference if the per-channel setting got support.

Examples

No response

Additional context

No response

Acknowledgements

I have looked for similar requests before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will make my requests politely.

turboderp · 2024-12-27T15:36:52Z

Could you elaborate? EXL2 already has one FP16 scale per output channel, as well as a 4-bit scale per group of weights.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQUEST] PerChannel Setting #707

[REQUEST] PerChannel Setting #707

Coco58323 commented Dec 27, 2024

turboderp commented Dec 27, 2024

[REQUEST] PerChannel Setting #707

[REQUEST] PerChannel Setting #707

Comments

Coco58323 commented Dec 27, 2024

Problem

Solution

Alternatives

Explanation

Examples

Additional context

Acknowledgements

turboderp commented Dec 27, 2024