Spec quantization #588

burmako · 2022-11-24T20:07:12Z

This will involve documenting: 1) a representation for quantized tensors, 2) UniformQuantizeOp, 3) UniformDequantizeOp, 4) which of the existing ops can take quantized tensors and how their semantics changes from that.

burmako · 2022-11-24T20:09:50Z

Similarly to the discussion in #8, the current plan is to wait until: 1) we finish speccing statically-shaped StableHLO ops, 2) we formalize syntax and maybe even evaluation of StableHLO programs (#484). Once that's done, I'm planning to provide a delta to this formalism that adds support for quantization. I think this won't be too hard and will result in a higher-quality design, because we'll be forced to explore more details.

Earlier today, in conclusion of the Q3/Q4 speccing marathon, we have finished speccing HLO semantics for the StableHLO ops. This was a huge effort that involved writing 93 specs, including digging deep into involved semantics of ops like batch_norm_grad, convolution, dot_general and more. Congratulations to everyone who contributed to this important milestone! The idea of this project was to create a baseline from which the StableHLO opset will evolve in the future. Our immediate next steps will be writing a dynamism RFC (#8) and speccing quantization (#588) on top of this baseline. Also, this speccing marathon has uncovered a lot of future work - both in cleaning up the opset and improving the implementation to fully conform to the spec. This is something that we're aiming to address in the next year.

burmako · 2023-02-10T22:17:02Z

See #1149 for a proposal for how to spec quantization in StableHLO in the context of alignment with TOSA.

subhankarshah · 2023-02-13T17:38:26Z

https://github.com/subhankarshah/stablehlo/blob/spec-quantization/docs/spec.md
Draft spec for quantization.

ghpvnist · 2023-03-14T19:19:30Z

[Action Item]: Verify the element-type of return value in inferConvolutionOp in TypeInference.cpp as noted in #1314 (comment)

mahmoud-abuzaina · 2023-03-15T01:03:20Z

@subhankarshah when will the UniformQuantizeOp be specced? I am wondering what rounding modes will be supported in that op?

sdasgup3 · 2023-03-15T17:48:42Z

Hi @mahmoud-abuzaina Thanks for your interest!
We are currently exploring options around speccing ops with quantized types: determining constraints on quantization parameters, types including the rounding mode involved during quantization. You can expect relevant PRs started pouring pouring in for review in Q1'23 and early Q2'23.

StableHLO dialect currently supports quantization via: 1) Supporting `quant.uniform` element types. 2) Having dedicated ops like `uniform_quantize` / `uniform_dequantize`. 3) Allowing regular ops like `add` / `convolution` to take quantized tensors. This support was inherited from MHLO when StableHLO was bootstrapped, and MHLO support was motivated by mobile use cases and inherited from TFLite. As pointed out in #1149, StableHLO specification doesn't support quantization at the moment, and this is an important gap that we would like to fix before StableHLO v1.0 (see #588). To continue the discussion started in #1149 and to make progress towards v1.0, this pull request: A) Adds QuantizedType to the StableHLO specification, modelled after [TFLite quantization spec](https://www.tensorflow.org/lite/performance/quantization_spec). B) To start a conversation about the applications of QuantizedType and the semantics of quantized ops, proposes semantics for quantized `add`. TFLite quantization spec doesn't cover everything. It specs constraints on types (which we captured accordingly in this pull request), but it doesn't go into describing semantics of quantized ops. As a result, the proposed semantics for quantized `add` is intentionally naive, as compared with the much more involved implementations in the TensorFlow repository, e.g.: * [tfl.add](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/add.cc). * [tf.UniformQuantizedAdd](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/uniform_quant_ops/uniform_quantized_add_op.cc). upd: After community discussion, we removed the spec for quantized `add` leaving that for future work, since further alignment is required. --------- Co-authored-by: Eugene Burmako <[email protected]>

jon-chuang · 2023-05-16T15:26:23Z

Hello, I'm interested in implementing an e2e example for llama and related models utilizing a stableHLO quantized matmul that should meet or exceed the performance (ppl and tokens/ms) of llama.cpp on CPU. Hopefully we can lower to a BLAS library.

sdasgup3 · 2023-05-16T18:26:27Z

Thanks @jon-chuang for your sharing your interest and I am super excited to explore together. A few clarification questions:

Do you expect the source llama model to be in C++ or it is OK for the model to be expressed using other frameworks like PyTorch?
Are you also interested in exploring the performance numbers in platforms other than CPU?

cc @GleasonK

jon-chuang · 2023-05-16T18:43:35Z

Do you expect the source llama model to be in C++ or it is OK for the model to be expressed using other frameworks like PyTorch

The answer should be anything that can be lowered to HLO, including pytorch, onnx etc. Llama weights I think is originally pytorch/huggingface.

Are you also interested in exploring the performance numbers in platforms other than CPU?

It's unclear at the moment. For instance, int4 quantization is not rly meaningful for GPU to my understanding due to lack of int4 matmul units and high overhead.

sdasgup3 · 2023-05-16T18:55:38Z

Thanks @jon-chuang, for the clarification. Let me get back on this.

sdasgup3 · 2023-06-20T23:09:13Z

Hi @jon-chuang

Overall, e2e example for llama and related models utilizing a StableHLO quantized matmul is very exciting to have and we have actively started working on gathering what's needed in StableHLO to represent quantization in cutting-edge LLMs (#1491). Feel free to have an eye on future updates.

On a side note: While exploring on it, I found a few interesting and relevant discussions in discord#jax which you might be interested in.

Some of the stablehlo ops does not have support for quantized types in their tablegen specification, prohibits writing StableHLO quantized programs using those ops. The PR is about adding the missing support for the following ops. Also, I believe the ongoing specification [work](#588), should not deviate much from the proposed changes here. ``` stablehlo.atan2 stablehlo.divide stablehlo.power stablehlo.remainder stablehlo.subtract stablehlo.abs stablehlo.cbrt stablehlo.cosine stablehlo.exponential stablehlo.exponential_minus_one stablehlo.log stablehlo.log_plus_one stablehlo.logistic stablehlo.negate stablehlo.rsqrt stablehlo.sign stablehlo.sine stablehlo.sqrt stablehlo.tanh stablehlo.cholesky stablehlo.triangular_solve ``` Other than these ops, we have `fft`, `rng`, and `rng_bit_generator` (or something else which I might be missing) which could be potential candidates for the support. I propose that we add the support after adding the specification of those op as adding the support might need some non-trivial discussion.

sdasgup3 · 2023-12-06T01:36:36Z

Hello All
With reduction based operations we are planning to close this current issue related to quantization specification. The remaining items are:

[Action Item]: Verify the element-type of return value in inferConvolutionOp in TypeInference.cpp as noted in Add interpreter for ConvolutionOp #1314 (comment)
See RFC for aligning StableHLO and TOSA arithmetic #1149 for a proposal for how to spec quantization in StableHLO in the context of alignment with TOSA.
e2e example for llama and related models utilizing a stableHLO

We are planning to open separate tickets for (1) and (2). Regarding (3), we are having some ongoing work in exporting quantized PyTorch models to StableHLO (ref)[https://github.com/pytorch/xla/pull/5763]. We will be happy to understand/address any specific quantization specification in separate tickets.

sdasgup3 · 2024-01-02T17:38:05Z

We have added #1896 and #1898 to track 1 and 2 resp. With that we are closing the current ticket.

burmako added the Spec label Nov 24, 2022

burmako self-assigned this Nov 24, 2022

This was referenced Nov 24, 2022

Add spec for UniformQuantizeOp #531

Closed

Add spec for UniformDequantizeOp #530

Closed

burmako mentioned this issue Dec 14, 2022

Introduce spec.md #766

Merged

burmako assigned subhankarshah and unassigned burmako Jan 27, 2023

subhankarshah mentioned this issue Feb 13, 2023

Add spec for Quantized Types and Constants #1161

Closed

burmako unassigned subhankarshah Feb 14, 2023

sdasgup3 mentioned this issue Feb 14, 2023

Develop Conformance Testsuite #956

Open

burmako assigned sdasgup3 Feb 15, 2023

burmako mentioned this issue Feb 19, 2023

Audit correspondence between the spec and VHLO #1155

Closed

sdasgup3 mentioned this issue Mar 14, 2023

Add interpreter for ConvolutionOp #1314

Closed

sdasgup3 mentioned this issue Mar 24, 2023

Introduce QuantizedType #1352

Merged

sdasgup3 mentioned this issue Jun 9, 2023

Adding the missing quantized-type support in the ODS #1608

Merged

sdasgup3 mentioned this issue Dec 4, 2023

Specification of quantized reduce and other related ops #1796

Merged

sdasgup3 closed this as completed Jan 2, 2024

GleasonK added the Quantization label Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec quantization #588

Spec quantization #588

burmako commented Nov 24, 2022

burmako commented Nov 24, 2022

burmako commented Feb 10, 2023

subhankarshah commented Feb 13, 2023

ghpvnist commented Mar 14, 2023

mahmoud-abuzaina commented Mar 15, 2023

sdasgup3 commented Mar 15, 2023

jon-chuang commented May 16, 2023 •

edited

Loading

sdasgup3 commented May 16, 2023 •

edited

Loading

jon-chuang commented May 16, 2023

sdasgup3 commented May 16, 2023 •

edited

Loading

sdasgup3 commented Jun 20, 2023

sdasgup3 commented Dec 6, 2023

sdasgup3 commented Jan 2, 2024

Spec quantization #588

Spec quantization #588

Comments

burmako commented Nov 24, 2022

burmako commented Nov 24, 2022

burmako commented Feb 10, 2023

subhankarshah commented Feb 13, 2023

ghpvnist commented Mar 14, 2023

mahmoud-abuzaina commented Mar 15, 2023

sdasgup3 commented Mar 15, 2023

jon-chuang commented May 16, 2023 • edited Loading

sdasgup3 commented May 16, 2023 • edited Loading

jon-chuang commented May 16, 2023

sdasgup3 commented May 16, 2023 • edited Loading

sdasgup3 commented Jun 20, 2023

sdasgup3 commented Dec 6, 2023

sdasgup3 commented Jan 2, 2024

jon-chuang commented May 16, 2023 •

edited

Loading

sdasgup3 commented May 16, 2023 •

edited

Loading

sdasgup3 commented May 16, 2023 •

edited

Loading