-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compressed lifecycle implementation (INT8 only) #33
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to get runtime unblocked, but it would be good to implement all compression under a QuantizationCompressor parent class similar to how we do for SparsityCompressor. Also, I think we need to discuss more how we want to handle quantization status being different for different layers (or if we don't want to support that)
* add classes * WIP * moving around classes * code complete * tests passing * unit test bugs * fill out int decompression * docstrings * allow repeat frozens * int compressor unit tests * PR comments
implements
COMPRESSED
piece of quantization lifecycle. currently assumes int8 format and will clip for any num_bits less than 8test_plan:
extends apply test to test compression phase as well