Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Can we have 1.0/1.5 bpw internally? #675

Open
3 tasks done
Originalimoc opened this issue Nov 17, 2024 · 1 comment
Open
3 tasks done

[REQUEST] Can we have 1.0/1.5 bpw internally? #675

Originalimoc opened this issue Nov 17, 2024 · 1 comment

Comments

@Originalimoc
Copy link

Originalimoc commented Nov 17, 2024

Problem

On low target bpw like 2.1 to 3.2. With the tested accuracy on some layer can be very high like 0.995+, especially true for larger model(see below data), if we can get 1.0/1.5 bpw on those layer, remaining quota can go to other more important layer potentially give overall better result.

Solution

Introduce something like "0.05:2b_64g/0.95:1b_64g s4" and "0.5:2b_64g/0.5:1b_64g s4". (I'm not sure if it can work this way)

Explanation

1B_to_72B_multiple_datasets_bpw_acc_measurement
(Green is ~14B Red is ~30B)
1-5 data is from ~1B-~70B quantization log from v0.2.4. You can see 70B+ models/purple acc can get very high to 0.99-0.999 at low bpw.

Acknowledgements

  • I have looked for similar requests before submitting this one.
  • I understand that the developers have lives and my issue will be answered when possible.
  • I understand the developers of this program are human, and I will make my requests politely.
@turboderp
Copy link
Member

It's difficult to go below 2 bits per weight, simply because that's the minimum amount of bits required to represent a value that can be either positive, negative or zero. Technically you can do it in ~1.58 bits, but this requires a grouped encoding (e.g. 20 weights in a 32 bit field), and that complicates the kernels a lot.

At 1 bit per weight you've only got positive and negative weights within each group, and in the experiments I've done that's the point at which things completely break down. Keep in mind those trend lines will have to diverge to -inf somewhere between 0 bpw and 2.13 bpw.

I do still plan to revisit quantization at some point, and I'm considering some options that might achieve less than 2 bpw on average without completely breaking the model. But they will need a different encoding scheme, I think, and currently I'm stuck on vision models it seems. 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants