Replies: 2 comments
-
Ah, me, searched 1.58bit in huggingface... there's a method related it. But ...I compared a lot, and selected a smaller model. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Is there a reason to use q4_0 specifically? I mean, it's ok but it's old. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
After applying quantization with q4_0, I noticed that the performance of my generated results has declined. I appreciate the speed and reduced VRAM usage that this quantized model offers, but I am seeking ways to enhance its performance. Could you please suggest any solutions or improvements, such as calibration, LoRA, fine-tuning, or other techniques? Thank you for your assistance!
Beta Was this translation helpful? Give feedback.
All reactions