-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added hybrid quantization for seq2seq models #427
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
# Compress weights of decoders for safity | ||
self.model.decoder_model = nncf.compress_weights(self.model.decoder_model) | ||
if self.model.use_cache: | ||
self.model.decoder_with_past_model = nncf.compress_weights(self.model.decoder_with_past_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be a possibility to also quantize the activations for the decoder components?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible but it may hurt accuracy.
This is a draft of the PR that should be merged after OpenVINO 2023.1 release