Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[onert-micro] Support Weight-Quantize Kernels #11774

Open
1 of 3 tasks
BalyshevArtem opened this issue Oct 24, 2023 · 1 comment
Open
1 of 3 tasks

[onert-micro] Support Weight-Quantize Kernels #11774

BalyshevArtem opened this issue Oct 24, 2023 · 1 comment
Assignees

Comments

@BalyshevArtem
Copy link
Contributor

BalyshevArtem commented Oct 24, 2023

What

Let's support weight quantize kernels implementation. This approach implies that the model has a float type, but at the same time some operations have quantized weights, so it is hybrid kernel.

Why

To reduce binary size for some target model.

How

Support if for:

@chunseoklee
Copy link
Contributor

chunseoklee commented Nov 18, 2024

Let's continue this on refactored onert-micro (#12427)

FYI, here is the quantization spec for Conv2D and DepthwiseConv ( from https://ai.google.dev/edge/litert/models/quantization_spec?hl=en ) we just refer to Weight spec here

CONV_2D
  Input 0:
    data_type  : int8
    range      : [-128, 127]
    granularity: per-tensor
  Input 1 (Weight):
    data_type  : int8
    range      : [-127, 127]
    granularity: per-axis (dim = 0)
    restriction: zero_point = 0
  Input 2 (Bias):
    data_type  : int32
    range      : [int32_min, int32_max]
    granularity: per-axis
    restriction: (scale, zero_point) = (input0_scale * input1_scale[...], 0)
  Output 0:
    data_type  : int8
    range      : [-128, 127]
    granularity: per-tensor

DEPTHWISE_CONV_2D
  Input 0:
    data_type  : int8
    range      : [-128, 127]
    granularity: per-tensor
  Input 1 (Weight):
    data_type  : int8
    range      : [-127, 127]
    granularity: per-axis (dim = 3)
    restriction: zero_point = 0
  Input 2 (Bias):
    data_type  : int32
    range      : [int32_min, int32_max]
    granularity: per-axis
    restriction: (scale, zero_point) = (input0_scale * input1_scale[...], 0)
  Output 0:
    data_type  : int8
    range      : [-128, 127]
    granularity: per-tensor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants