Converting Quantised PyTorch Models #40
Replies: 3 comments 5 replies
-
Have you ever tried to writing your custom nobuco.converter? It might be much easier than you think. When I have time, I would also try a bit. Although I am not the owner of this project, I think your question is just valid for this project. Nobuco is well structured to convert from the fundamental torch functions to the entire model recursively. One person uses only small parts of torch, but we as a group use almost all the parts of torch. If each of us implements Unimplemented nodes for our own sake, Nobuco will be powered to convert most of torch models at some point. |
Beta Was this translation helpful? Give feedback.
-
I tried to add support for these dtypes lately, and that's a bigger task than I expected. For starters, quantized tensors in Tensorflow are not even tensors but Like you said, quantized ops are nowhere to be seen in Tensorflow.
Yes,
Dunno, my experience with that stuff is very limited.
I wouldn't have created Nobuco If I knew a better way to deploy models on the mobile and the web.
TFLite converter can perform post-training quantization automatically, why would do the same thing by hand?
I'd like to, I just don't have a good idea of how to approach that. Looks like there are lots of things to consider. |
Beta Was this translation helpful? Give feedback.
-
Alright, I think I'm onto something here. Got it to work for a simple quantized model. Try it: pip install https://github.com/AlexanderLutsenko/nobuco/archive/quantized.zip Whether the converted model will be properly quantized by TFLite is another question. |
Beta Was this translation helpful? Give feedback.
-
This is somewhere between a Nobuco feature request and just general questions about how quantisation in TensorFlow/TFLite works in case anybody who sees this knows.
Essentially my problem is that I have a quantised PyTorch model (manually quantised, so I am directly calling
torch.quantize_per_tensor
,torch.ops.quantized.linear
,torch.ops.quantized.layer_norm
etc...) and I want to convert this model to TFLite for deployment.For the non-quantised model, Nobuco does a great job (#36 was a bit funky and there were a couple operators with missing support but adding the
@nobuco.converter
s was surprisingly intuitive). For the quantised model, I am having a harder time.One problem is that when converting, I get a ton of errors about
quint8
andqint8
dtype
s not being supported. I haven't yet dug into these error messages enough to figure out if this is a sign of a fundamental problem or just an obvious case not being handled somewhere, mainly because I've been more concerned by a second issue...Specifically, intercepting the quantised ops with
@nobuco.converter
s works like any other built-in PyTorch operators, but I am struggling to find TensorFlow functions to replace them with.torch.quantize_per_tensor
becomestf.quantization.quantize
andtorch.Tensor.dequantize
becomestf.quantization.dequantize
but beyond that, as far as I can tell, TensorFlow doesn't expose quantised versions of stuff like dense layers (though it does expose quantised concatenation... for some reason?). My best guess for why, looking at the APIs for TensorFlow quantisation, is that these operators don't really exist inside TensorFlow itself, only in TFLite.BUT of course it is possible to have models in TensorFlow that "act" like they are quantised (both during training in TensorFlow and when exporting to TFLite) - this is necessary for quantisation-aware training to work! The problem is that the APIs to create these seem really opaque. Instead of creating individual quantised layers and being able to set their
zero_point
andscale
you just calltfmot.quantization.keras.quantize_model
and this does a bunch of stuff behind-the-scenes.So, context covered, specific questions time (to anyone who thinks they might know):
tfmot.quantization.keras.quantize_model
works under-the-hood by inserting fake quantisation layers around ordinary TensorFlow operators (and then the TFLite exporter looks for these fake quantisation layers and replaces them with actual quantised operators), then I could try to do a similar thing inside the@nobuco.converter
s? Does this sound realistic/feasible?torch.export
)Beta Was this translation helpful? Give feedback.
All reactions