[onert-micro] Support Weight-Quantize Kernels #11774

BalyshevArtem · 2023-10-24T15:22:27Z

What

Let's support weight quantize kernels implementation. This approach implies that the model has a float type, but at the same time some operations have quantized weights, so it is hybrid kernel.

Why

To reduce binary size for some target model.

How

Support if for:

FullyConnected ( [onert-micro] support weight quantized (int8) FullyConnected kernel #14137 on refactored src)
Conv2D
DepthwiseConv

chunseoklee · 2024-11-18T03:49:48Z

Let's continue this on refactored onert-micro (#12427)

FYI, here is the quantization spec for Conv2D and DepthwiseConv ( from https://ai.google.dev/edge/litert/models/quantization_spec?hl=en ) we just refer to Weight spec here

CONV_2D
  Input 0:
    data_type  : int8
    range      : [-128, 127]
    granularity: per-tensor
  Input 1 (Weight):
    data_type  : int8
    range      : [-127, 127]
    granularity: per-axis (dim = 0)
    restriction: zero_point = 0
  Input 2 (Bias):
    data_type  : int32
    range      : [int32_min, int32_max]
    granularity: per-axis
    restriction: (scale, zero_point) = (input0_scale * input1_scale[...], 0)
  Output 0:
    data_type  : int8
    range      : [-128, 127]
    granularity: per-tensor

DEPTHWISE_CONV_2D
  Input 0:
    data_type  : int8
    range      : [-128, 127]
    granularity: per-tensor
  Input 1 (Weight):
    data_type  : int8
    range      : [-127, 127]
    granularity: per-axis (dim = 3)
    restriction: zero_point = 0
  Input 2 (Bias):
    data_type  : int32
    range      : [int32_min, int32_max]
    granularity: per-axis
    restriction: (scale, zero_point) = (input0_scale * input1_scale[...], 0)
  Output 0:
    data_type  : int8
    range      : [-128, 127]
    granularity: per-tensor

BalyshevArtem self-assigned this Oct 24, 2023

BalyshevArtem mentioned this issue Oct 24, 2023

[onert-micro] Support Weight-Quantize FullyConnected kernel #11775

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[onert-micro] Support Weight-Quantize Kernels #11774

[onert-micro] Support Weight-Quantize Kernels #11774

BalyshevArtem commented Oct 24, 2023 •

edited by chunseoklee

Loading

chunseoklee commented Nov 18, 2024 •

edited

Loading

[onert-micro] Support Weight-Quantize Kernels #11774

[onert-micro] Support Weight-Quantize Kernels #11774

Comments

BalyshevArtem commented Oct 24, 2023 • edited by chunseoklee Loading

What

Why

How

chunseoklee commented Nov 18, 2024 • edited Loading

BalyshevArtem commented Oct 24, 2023 •

edited by chunseoklee

Loading

chunseoklee commented Nov 18, 2024 •

edited

Loading