You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current implementation has not supported Perchannel Quantization. Would consider add it into config?
Solution
Implements per-channel quantization parameters in conversion/qparams.py and optimizes GEMM computation through deferred scaling, where column-wise scaling factors are applied to the final accumulation results to minimize arithmetic operations and improve computational efficiency.
Alternatives
No response
Explanation
For higher than 4 bit, the accuracy is nearly lossless with per-channel quantization. It would be efficient for inference if the per-channel setting got support.
Examples
No response
Additional context
No response
Acknowledgements
I have looked for similar requests before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will make my requests politely.
The text was updated successfully, but these errors were encountered:
Problem
Current implementation has not supported Perchannel Quantization. Would consider add it into config?
Solution
Implements per-channel quantization parameters in conversion/qparams.py and optimizes GEMM computation through deferred scaling, where column-wise scaling factors are applied to the final accumulation results to minimize arithmetic operations and improve computational efficiency.
Alternatives
No response
Explanation
For higher than 4 bit, the accuracy is nearly lossless with per-channel quantization. It would be efficient for inference if the per-channel setting got support.
Examples
No response
Additional context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: