Skip to content

Commit

Permalink
removed reference that was missed in notebook to qop
Browse files Browse the repository at this point in the history
  • Loading branch information
costigt-dev committed Mar 22, 2024
1 parent e994f8e commit 5d37881
Showing 1 changed file with 0 additions and 96 deletions.
96 changes: 0 additions & 96 deletions notebooks/Brevitas_TVMCon2021.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1903,102 +1903,6 @@
" return IFrame(src=f\"http://localhost:{port}/\", width=\"100%\", height=400)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Export to ONNX QOps\n",
"\n",
"Say we want to export a QuantConv1d with 4b symmetric weights, 8b symmetric inputs and outputs, and 16 biases. \n",
"We can export it to a ONNX's `QLinearConv`, but some information will be lost. In particular, weights will be represented as 8b and bias as 32b, even though they are respectively 4b and 16b. This is because ONNX does not provide a standardized way to represent them as such:"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/scratch/fabian/brevitas/src/brevitas/export/onnx/standard/manager.py:26: UserWarning: ONNX opset version set to 13, override with opset_version=\n",
" warnings.warn(f\"ONNX opset version set to {DEFAULT_OPSET}, override with {ka}=\")\n"
]
}
],
"source": [
"torch.manual_seed(0)\n",
"\n",
"from brevitas.nn import QuantConv1d\n",
"from brevitas.quant import Int8WeightPerTensorFloat, Int8ActPerTensorFloat, Int16Bias\n",
"from brevitas.export import export_onnx_qop\n",
"\n",
"float_inp = torch.randn(1, 2, 5)\n",
"\n",
"quant_conv_4b8b = QuantConv1d(\n",
" 2, 4, 3, bias=True, weight_bit_width=4,\n",
" input_quant=Int8ActPerTensorFloat,\n",
" output_quant=Int8ActPerTensorFloat,\n",
" bias_quant=Int16Bias)\n",
"\n",
"output_path = 'qop_onnx_conv_4b8b.onnx'\n",
"exported_model = export_onnx_qop(quant_conv_4b8b, input_t=float_inp, export_path=output_path)"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"tags": [
"skip-execution"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Serving 'qop_onnx_conv_4b8b.onnx' at http://localhost:8082\n"
]
},
{
"data": {
"text/html": [
"\n",
" <iframe\n",
" width=\"100%\"\n",
" height=\"400\"\n",
" src=\"http://localhost:8082/\"\n",
" frameborder=\"0\"\n",
" allowfullscreen\n",
" \n",
" ></iframe>\n",
" "
],
"text/plain": [
"<IPython.lib.display.IFrame at 0x7f92ca3e1a10>"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"show_netron(output_path, 8082)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In general the standard ONNX opset doesn't support representing quantization below 8b. Additionally, ONNX QOp representation requires an output quantizer to be set at part of of the layer. \n",
"\n",
"The constraint of always having an output quantizer is relaxed in the more recently introduced QDQ style of representation (for which there is support in Brevitas starting from version 0.8), which uses only `QuantizeLinear` and `DequantizeLinear` to represent quantization, but even with that support is still limited to 8b quantization."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down

0 comments on commit 5d37881

Please sign in to comment.