-
Whenever I create a BitLinear and call post_process_weights, it does not actually quantize the weight in memory. x.weight gives: but this is still taking up a ton more memory than it should... If anyone has this working, please let me know here |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
hi @Codys12 , would you mind provide a script to reproduce this result? |
Beta Was this translation helpful? Give feedback.
Sorry, right after I posted this I tried
model.to("cuda")
and the memory on the GPU was ~1.5GB