Skip to content

Commit

Permalink
prototype_source/backend_config_tutorial.rst ๋ฒˆ์—ญ (#909)
Browse files Browse the repository at this point in the history
* prototype_source/backend_config_tutorial.rst ๋ฒˆ์—ญ
  • Loading branch information
jason9865 authored Oct 15, 2024
1 parent f0e9318 commit 2f5c892
Showing 1 changed file with 76 additions and 87 deletions.
163 changes: 76 additions & 87 deletions prototype_source/backend_config_tutorial.rst
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
(prototype) PyTorch BackendConfig Tutorial
==========================================
**Author**: `Andrew Or <https://github.com/andrewor14>`_

The BackendConfig API enables developers to integrate their backends
with PyTorch quantization. It is currently only supported in FX graph
mode quantization, but support may be extended to other modes of
quantization in the future. In this tutorial, we will demonstrate how to
use this API to customize quantization support for specific backends.
For more information on the motivation and implementation details behind
BackendConfig, please refer to this
**์ €์ž**: `Andrew Or <https://github.com/andrewor14>`_
**๋ฒˆ์—ญ**: `์žฅ์Šนํ˜ธ <https://github.com/jason9865>`_

BackendConfig API๋ฅผ ํ†ตํ•ด ๋ฐฑ์—”๋“œ ํ™˜๊ฒฝ์—์„œ PyTorch ์–‘์žํ™”๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๊ธฐ์กด ํ™˜๊ฒฝ์—์„œ๋Š” FX ๊ทธ๋ž˜ํ”„ ๋ชจ๋“œ ์–‘์žํ™” ๋งŒ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์ง€๋งŒ
์ถ”ํ›„์—๋Š” ๋‹ค๋ฅธ ๋ชจ๋“œ ๋˜ํ•œ ์ง€์› ํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.
๋ณธ ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” ํŠน์ • ๋ฐฑ์—”๋“œ ํ™˜๊ฒฝ์—์„œ ์–‘์žํ™” ๊ธฐ๋Šฅ์„ ์ปค์Šคํ„ฐ๋งˆ์ด์ง•ํ•˜๊ธฐ ์œ„ํ•ด
BackendConfig API๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๋‹ค๋ฃน๋‹ˆ๋‹ค.
BackendConfig๊ฐ€ ๋งŒ๋“ค์–ด์ง„ ๋™๊ธฐ์™€ ๊ตฌํ˜„ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์„ธ๋ถ€์ •๋ณด๋ฅผ ์•Œ๊ณ  ์‹ถ์œผ์‹œ๋‹ค๋ฉด
์•„๋ž˜ ์‚ฌ์ดํŠธ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.
`README <https://github.com/pytorch/pytorch/tree/master/torch/ao/quantization/backend_config>`__.

Suppose we are a backend developer and we wish to integrate our backend
with PyTorch's quantization APIs. Our backend consists of two ops only:
quantized linear and quantized conv-relu. In this section, we will walk
through how to achieve this by quantizing an example model using a custom
BackendConfig through `prepare_fx` and `convert_fx`.
์—ฌ๋Ÿฌ๋ถ„์ด PyTorch์˜ ์–‘์žํ™” API๋ฅผ ๋ฐฑ์—”๋“œ ํ™˜๊ฒฝ์—์„œ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ์–ดํ•˜๋Š” ๋ฐฑ์—”๋“œ ๊ฐœ๋ฐœ์ž๋ผ๊ณ  ๊ฐ€์ •ํ•ด๋ด…์‹œ๋‹ค.
๋ฐฑ์—”๋“œ ํ™˜๊ฒฝ์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์„ ํƒ์ง€๋Š” ์–‘์žํ™”๋œ ์„ ํ˜•(Linear) ์—ฐ์‚ฐ์ž์™€ ํ•ฉ์„ฑ๊ณฑ(Convolution) ReLU ์—ฐ์‚ฐ์ž๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด๋ฒˆ ์žฅ์—์„œ๋Š” `prepare_fx`์™€ `convert_fx`๋ฅผ ํ†ตํ•ด ์ปค์Šคํ…€ BackendConfig๋ฅผ ๋งŒ๋“ค๊ณ ,
์ด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์˜ˆ์‹œ ๋ชจ๋ธ์„ ์–‘์žํ™”ํ•˜์—ฌ ๋ฐฑ์—”๋“œ ํ™˜๊ฒฝ์„ ๊ตฌ์ถ•ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

.. code:: ipython3
Expand All @@ -36,32 +36,30 @@ BackendConfig through `prepare_fx` and `convert_fx`.
)
from torch.ao.quantization.quantize_fx import prepare_fx, convert_fx
1. Derive reference pattern for each quantized operator
1. ์–‘์žํ™”๋œ ์—ฐ์‚ฐ์ž๋ฅผ ์œ„ํ•œ ์ฐธ์กฐ ํŒจํ„ด ์œ ๋„ํ•˜๊ธฐ
--------------------------------------------------------

For quantized linear, suppose our backend expects the reference pattern
`[dequant - fp32_linear - quant]` and lowers it into a single quantized
linear op. The way to achieve this is to first insert quant-dequant ops
before and after the float linear op, such that we produce the following
reference model::
์–‘์žํ™”๋œ ์„ ํ˜•์—ฐ์‚ฐ์ž๋ฅผ ์œ„ํ•ด ๋ฐฑ์—”๋“œ ํ™˜๊ฒฝ์—์„œ๋Š” `[dequant - fp32_linear - quant]` ์ฐธ์กฐ ํŒจํ„ด์„
์–‘์žํ™”๋œ ๋‹จ์ผ ์„ ํ˜• ์—ฐ์‚ฐ์ž๋กœ ์ถ•์†Œํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ์‹œ๋‹ค.
์ด๋ฅผ ์œ„ํ•ด ์šฐ์„  quant-dequant์—ฐ์‚ฐ์ž๋ฅผ ๋ถ€๋™์†Œ์ˆ˜์  ์„ ํ˜• ์—ฐ์‚ฐ์ž ์•ž ๋’ค๋กœ ์‚ฝ์ž…ํ•˜์—ฌ
์•„๋ž˜์™€ ๊ฐ™์€ ์ถ”๋ก  ๋ชจ๋ธ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.::

quant1 - [dequant1 - fp32_linear - quant2] - dequant2

Similarly, for quantized conv-relu, we wish to produce the following
reference model, where the reference pattern in the square brackets will
be lowered into a single quantized conv-relu op::
์ด์™€ ์œ ์‚ฌํ•˜๊ฒŒ ์–‘์žํ™”๋œ ํ•ฉ์„ฑ๊ณฑ ReLU ์—ฐ์‚ฐ์ž๋ฅผ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด์„œ๋Š”
๋Œ€๊ด„ํ˜ธ ์•ˆ์— ์žˆ๋Š” ์ฐธ์กฐํŒจํ„ด์„ ํ•˜๋‚˜์˜ ์–‘์žํ™”๋œ ํ•ฉ์„ฑ๊ณฑ ReLU ์—ฐ์‚ฐ์ž๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.::

quant1 - [dequant1 - fp32_conv_relu - quant2] - dequant2

2. Set DTypeConfigs with backend constraints
2. ๋ฐฑ์—”๋“œ ํ™˜๊ฒฝ ์ œ์•ฝ์กฐ๊ฑด์„ DTypeConfig๋กœ ์„ค์ •ํ•˜๊ธฐ
---------------------------------------------

In the reference patterns above, the input dtype specified in the
DTypeConfig will be passed as the dtype argument to quant1, while the
output dtype will be passed as the dtype argument to quant2. If the output
dtype is fp32, as in the case of dynamic quantization, then the output
quant-dequant pair will not be inserted. This example also shows how to
specify restrictions on quantization and scale ranges on a particular dtype.
์•ž์„œ ์–ธ๊ธ‰ํ•œ ์ถ”๋ก  ํŒจํ„ด์—์„œ DTypeConfig์— ๋ช…์‹œ๋œ ์ž…๋ ฅ๊ฐ’์˜ ๋ฐ์ดํ„ฐ ํƒ€์ž…์€
quant1 ๋ณ€์ˆ˜์˜ ๋ฐ์ดํ„ฐ ํƒ€์ž… ์ธ์ž๋กœ, ์ถœ๋ ฅ๊ฐ’์˜ ๋ฐ์ดํ„ฐ ํƒ€์ž…์€ quant2 ๋ณ€์ˆ˜์˜
๋ฐ์ดํ„ฐ ํƒ€์ž… ์ธ์ž๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค. ๋™์  ์–‘์žํ™”(dynamic quantization)์˜ ๊ฒฝ์šฐ,
์ถœ๋ ฅ๊ฐ’์˜ ๋ฐ์ดํ„ฐ ํƒ€์ž…์ด fp32์ผ ๊ฒฝ์šฐ ์ถœ๋ ฅ๊ฐ’์˜ quant-dequant ์Œ์€ ์‚ฝ์ž…๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
์•„๋ž˜ ์˜ˆ์ œ ์ฝ”๋“œ์—์„œ ์–‘์žํ™” ์‹œ ํ•„์š”ํ•œ ์ œ์•ฝ์กฐ๊ฑด์„ ๋‚˜ํƒ€๋‚ด๊ณ 
ํŠน์ • ๋ฐ์ดํ„ฐ ํƒ€์ž…์˜ ๋ฒ”์œ„๋ฅผ ์ง€์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

.. code:: ipython3
Expand All @@ -79,35 +77,33 @@ specify restrictions on quantization and scale ranges on a particular dtype.
weight_dtype=torch.qint8,
bias_dtype=torch.float)
3. Set up fusion for conv-relu
3. ํ•ฉ์„ฑ๊ณฑ ReLU ๊ฒฐํ•ฉ(fusion)ํ•˜๊ธฐ
-------------------------------

Note that the original user model contains separate conv and relu ops,
so we need to first fuse the conv and relu ops into a single conv-relu
op (`fp32_conv_relu`), and then quantize this op similar to how the linear
op is quantized. We can set up fusion by defining a function that accepts
3 arguments, where the first is whether or not this is for QAT, and the
remaining arguments refer to the individual items of the fused pattern.
์ดˆ๊ธฐ ์‚ฌ์šฉ์ž ๋ชจ๋ธ์—์„œ๋Š” ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ์ž์™€ ReLU ์—ฐ์‚ฐ์ž๊ฐ€ ๋ถ„๋ฆฌ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
๋”ฐ๋ผ์„œ ๋จผ์ € ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ์ž์™€ ReLU ์—ฐ์‚ฐ์ž๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ํ•˜๋‚˜์˜ ํ•ฉ์„ฑ๊ณฑ-ReLU์—ฐ์‚ฐ์ž๋ฅผ ๋งŒ๋“  ํ›„
์„ ํ˜• ์—ฐ์‚ฐ์ž๋ฅผ ์–‘์žํ™”ํ•œ ๊ฒƒ๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ํ•ฉ์„ฑ๊ณฑ-ReLU ์—ฐ์‚ฐ์ž๋ฅผ ์–‘์žํ™”๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
์ด ๋•Œ 3๊ฐœ์˜ ์ธ์ž๋ฅผ ๊ฐ–๋Š” ํ•จ์ˆ˜๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ์ฒซ๋ฒˆ์งธ ์ธ์ž๋Š” QAT์ด ์ ์šฉ๋˜๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ
๋‚˜๋จธ์ง€ 2๊ฐœ์˜ ์ธ์ž๋Š” ๊ฒฐํ•ฉ๋œ ํŒจํ„ด์˜ ๊ฐœ๋ณ„ ์š”์†Œ(์—ฌ๊ธฐ์„œ๋Š” ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ์ž์™€ ReLU)๋ฅผ ๊ฐ€๋ฆฌํ‚ต๋‹ˆ๋‹ค.

.. code:: ipython3
def fuse_conv2d_relu(is_qat, conv, relu):
"""Return a fused ConvReLU2d from individual conv and relu modules."""
return torch.ao.nn.intrinsic.ConvReLU2d(conv, relu)
4. Define the BackendConfig
4. BackendConfig ์ •์˜ํ•˜๊ธฐ
----------------------------

Now we have all the necessary pieces, so we go ahead and define our
BackendConfig. Here we use different observers (will be renamed) for
the input and output for the linear op, so the quantization params
passed to the two quantize ops (quant1 and quant2) will be different.
This is commonly the case for weighted ops like linear and conv.
์ด์ œ ํ•„์š”ํ•œ ๊ฒƒ์€ ๋ชจ๋‘ ์ค€๋น„๊ฐ€ ๋˜์—ˆ์œผ๋‹ˆ BackendConfig๋ฅผ ์ •์˜ํ•ด๋ด…์‹œ๋‹ค.
์„ ํ˜• ์—ฐ์‚ฐ์ž์˜ ์ž…๋ ฅ๊ฐ’๊ณผ ์ถœ๋ ฅ๊ฐ’์— ๋Œ€ํ•ด ์„œ๋กœ ๋‹ค๋ฅธ observer(๋ช…์นญ์€ ์ถ”ํ›„ ๋ณ€๊ฒฝ ์˜ˆ์ •)๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
์ด๋ฅผ ํ†ตํ•ด ์–‘์žํ™” ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ์–‘์žํ™” ์—ฐ์‚ฐ์ž(quant1๊ณผ quant2)๋ฅผ ๊ฑฐ์น˜๋ฉฐ
์ด์™€ ๊ฐ™์€ ๋ฐฉ์‹์€ ์„ ํ˜• ์—ฐ์‚ฐ์ด๋‚˜ ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ๊ณผ ๊ฐ™์ด ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์—ฐ์‚ฐ์—์„œ
์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

For the conv-relu op, the observation type is the same. However, we
need two BackendPatternConfigs to support this op, one for fusion
and one for quantization. For both conv-relu and linear, we use the
DTypeConfig defined above.
ํ•ฉ์„ฑ๊ณฑ-ReLU ์—ฐ์‚ฐ์ž์˜ ๊ฒฝ์šฐ observation์˜ ํƒ€์ž…์€ ๋™์ผํ•ฉ๋‹ˆ๋‹ค.
ํ•˜์ง€๋งŒ BackendPatternConfig์˜ ๊ฒฝ์šฐ ๊ฒฐํ•ฉ๊ณผ ์–‘์žํ™”์— ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด 2๊ฐœ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
ํ•ฉ์„ฑ๊ณฑ-ReLU์™€ ์„ ํ˜• ์—ฐ์‚ฐ์ž์—๋Š” ์•ž์„œ ์ •์˜ํ•œ DTypeConfig๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

.. code:: ipython3
Expand Down Expand Up @@ -141,35 +137,32 @@ DTypeConfig defined above.
.set_backend_pattern_config(conv_relu_config) \
.set_backend_pattern_config(fused_conv_relu_config)
5. Set up QConfigMapping that satisfies the backend constraints
5. ๋ฐฑ์—”๋“œ ํ™˜๊ฒฝ ์ œ์•ฝ์กฐ๊ฑด์„ ๋งŒ์กฑ์‹œํ‚ค๋Š” QConfigMapping ์„ค์ •ํ•˜๊ธฐ
----------------------------------------------------------------

In order to use the ops defined above, the user must define a QConfig
that satisfies the constraints specified in the DTypeConfig. For more
detail, see the documentation for `DTypeConfig <https://pytorch.org/docs/stable/generated/torch.ao.quantization.backend_config.DTypeConfig.html>`__.
We will then use this QConfig for all the modules used in the patterns
we wish to quantize.
์•ž์„œ ์ •์˜ํ•œ ์—ฐ์‚ฐ์ž๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” DTypeConfig์˜ ์ œ์•ฝ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š”
QConfig๋ฅผ ์ •์˜ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ `DTypeConfig <https://pytorch.org/docs/stable/generated/torch.ao.quantization.backend_config.DTypeConfig.html>`__์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.
๊ทธ๋ฆฌ๊ณ  ์–‘์žํ™”ํ•˜๋ ค๋Š” ํŒจํ„ด๋“ค์— ์‚ฌ์šฉ๋˜๋Š” ๋ชจ๋“  ๋ชจ๋“ˆ์— QConfig๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

.. code:: ipython3
# Note: Here we use a quant_max of 127, but this could be up to 255 (see `quint8_with_constraints`)
# ์ฃผ์˜ : quant_max ๊ฐ’์€ 127์ด์ง€๋งŒ ์ถ”ํ›„ 255๊นŒ์ง€ ๋Š˜์–ด๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.(`quint8_with_constraints`๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”)
activation_observer = MinMaxObserver.with_args(quant_min=0, quant_max=127, eps=2 ** -12)
qconfig = QConfig(activation=activation_observer, weight=default_weight_observer)
# Note: All individual items of a fused pattern, e.g. Conv2d and ReLU in
# (Conv2d, ReLU), must have the same QConfig
# ์ฃผ์˜ : (Conv2d, ReLU) ๋‚ด๋ถ€ Conv2d์™€ ReLU์™€ ๊ฐ™์€ ๊ฒฐํ•ฉ๋œ ํŒจํ„ด์˜ ๋ชจ๋“  ๊ฐœ๋ณ„ ์š”์†Œ๋“ค์€
# ๋ฐ˜๋“œ์‹œ ๊ฐ™์€ QConfig์—ฌ์•ผํ•ฉ๋‹ˆ๋‹ค.
qconfig_mapping = QConfigMapping() \
.set_object_type(torch.nn.Linear, qconfig) \
.set_object_type(torch.nn.Conv2d, qconfig) \
.set_object_type(torch.nn.BatchNorm2d, qconfig) \
.set_object_type(torch.nn.ReLU, qconfig)
6. Quantize the model through prepare and convert
6. ์‚ฌ์ „ ์ฒ˜๋ฆฌ(prepare)์™€ ๋ณ€ํ™˜(convert)์„ ํ†ตํ•œ ๋ชจ๋ธ ์–‘์žํ™”
--------------------------------------------------

Finally, we quantize the model by passing the BackendConfig we defined
into prepare and convert. This produces a quantized linear module and
a fused quantized conv-relu module.
๋งˆ์ง€๋ง‰์œผ๋กœ ์•ž์„œ ์ •์˜ํ•œ BackendConfig๋ฅผ prepare๊ณผ convert๋ฅผ ๊ฑฐ์ณ ์–‘์žํ™”ํ•ฉ๋‹ˆ๋‹ค.
์ด๋ฅผ ํ†ตํ•ด ์–‘์žํ™”๋œ ์„ ํ˜• ๋ชจ๋“ˆ๊ณผ ๊ฒฐํ•ฉ๋œ ํ•ฉ์„ฑ๊ณฑ-ReLU ๋ชจ๋ธ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

.. code:: ipython3
Expand Down Expand Up @@ -218,16 +211,16 @@ a fused quantized conv-relu module.
sigmoid = self.sigmoid(dequantize_2); dequantize_2 = None
return sigmoid
(7. Experiment with faulty BackendConfig setups)
(7. ์˜ค๋ฅ˜๊ฐ€ ์žˆ๋Š” BackendConfig ์„ค์ • ์‹คํ—˜ํ•˜๊ธฐ)
-------------------------------------------------

As an experiment, here we modify the model to use conv-bn-relu
instead of conv-relu, but use the same BackendConfig, which doesn't
know how to quantize conv-bn-relu. As a result, only linear is
quantized, but conv-bn-relu is neither fused nor quantized.
์‹คํ—˜์˜ ์ผํ™˜์œผ๋กœ ํ•ฉ์„ฑ๊ณฑ-ReLU ์—ฐ์‚ฐ์ž ๋Œ€์‹  ํ•ฉ์„ฑ๊ณฑ-๋ฐฐ์น˜์ •๊ทœํ™”-ReLU(conv-bn-relu) ๋ชจ๋ธ์„ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.
์ด ๋•Œ BackendConfig๋Š” ์ด์ „๊ณผ ๋™์ผํ•œ ๊ฒƒ์„ ์‚ฌ์šฉํ•˜๋ฉฐ ํ•ฉ์„ฑ๊ณฑ-๋ฐฐ์น˜์ •๊ทœํ™”-ReLU ์–‘์žํ™” ๊ด€๋ จ๋œ ์ •๋ณด๋Š” ์—†์Šต๋‹ˆ๋‹ค.
์‹คํ—˜ ๊ฒฐ๊ณผ, ์„ ํ˜• ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ์–‘์žํ™”๊ฐ€ ์„ฑ๊ณต์ ์œผ๋กœ ์ง„ํ–‰๋˜์—ˆ์ง€๋งŒ ํ•ฉ์„ฑ๊ณฑ-๋ฐฐ์น˜์ •๊ทœํ™”-ReLU์˜ ๊ฒฝ์šฐ
๊ฒฐํ•ฉ๊ณผ ์–‘์žํ™” ๋ชจ๋‘ ์ด๋ฃจ์–ด์ง€์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

.. code:: ipython3
# Only linear is quantized, since there's no rule for fusing conv-bn-relu
# ํ•ฉ์„ฑ๊ณฑ-๋ฐฐ์น˜์ •๊ทœํ™”-ReLU์™€ ๊ด€๋ จ๋œ ์ •๋ณด๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์„ ํ˜• ๋ชจ๋ธ ๋งŒ ์–‘์žํ™”๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
example_inputs = (torch.rand(1, 3, 10, 10, dtype=torch.float),)
model = MyModel(use_bn=True)
prepared = prepare_fx(model, qconfig_mapping, example_inputs, backend_config=backend_config)
Expand Down Expand Up @@ -258,9 +251,8 @@ quantized, but conv-bn-relu is neither fused nor quantized.
sigmoid = self.sigmoid(relu); relu = None
return sigmoid
As another experiment, here we use the default QConfigMapping that
doesn't satisfy the dtype constraints specified in the backend. As
a result, nothing is quantized since the QConfigs are simply ignored.
๋ฐฑ์—”๋“œ ํ™˜๊ฒฝ์— ๋ฐ์ดํ„ฐ ํƒ€์ž… ์ œ์•ฝ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜์ง€ ์•Š๋Š” ๊ธฐ๋ณธ QConfigMapping์„ ์ด์šฉํ•˜์—ฌ ๋˜ ๋‹ค๋ฅธ ์‹คํ—˜์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.
์‹คํ˜ ๊ฒฐ๊ณผ QConfig๊ฐ€ ๋ฌด์‹œ๋˜์–ด ์–ด๋–ค ๋ชจ๋ธ๋„ ์–‘์žํ™” ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

.. code:: ipython3
# Nothing is quantized or fused, since backend constraints are not satisfied
Expand Down Expand Up @@ -291,36 +283,33 @@ a result, nothing is quantized since the QConfigs are simply ignored.
return sigmoid
Built-in BackendConfigs
๊ธฐ๋ณธ BackendConfig
-----------------------

PyTorch quantization supports a few built-in native BackendConfigs under
the ``torch.ao.quantization.backend_config`` namespace:
PyTorch ์–‘์žํ™”๋Š” ``torch.ao.quantization.backend_config`` ๋„ค์ž„์ŠคํŽ˜์ด์Šค ํ•˜์œ„
์—ฌ๋Ÿฌ ๊ธฐ๋ณธ BackendConfig๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

- `get_fbgemm_backend_config <https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/backend_config/fbgemm.py>`__:
for server target settings
์„œ๋ฒ„ ์„ธํŒ…์šฉ BackendConfig
- `get_qnnpack_backend_config <https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/backend_config/qnnpack.py>`__:
for mobile and edge device target settings, also supports XNNPACK
quantized ops
๋ชจ๋ฐ”์ผ ๋ฐ ์—ฃ์ง€ ์žฅ๋น„, XNNPack ์–‘์žํ™” ์—ฐ์‚ฐ์ž ์ง€์› BackendConfig
- `get_native_backend_config <https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/backend_config/native.py>`__
(default): a BackendConfig that supports a union of the operator
patterns supported in the FBGEMM and QNNPACK BackendConfigs
(๊ธฐ๋ณธ๊ฐ’): FBGEMM๊ณผ QNNPACK BackendConfig ๋‚ด์—์„œ ์ œ๊ณต๋˜๋Š” ์—ฐ์‚ฐ์ž ํŒจํ„ด์„
์ง€์›ํ•˜๋Š” BackendConfig

There are also other BackendConfigs under development (e.g.ย for
TensorRT and x86), but these are still mostly experimental at the
moment. If the user wishes to integrate a new, custom backend with
PyTorchโ€™s quantization API, they may define their own BackendConfigs
using the same set of APIs used to define the natively supported
ones as in the example above.
๊ทธ ๋ฐ–์— ๋‹ค๋ฅธ BackendConfig(TensorRT, x86 ๋“ฑ)๊ฐ€ ๊ฐœ๋ฐœ ์ค‘์ด์ง€๋งŒ
์•„์ง ์‹คํ—˜ ๋‹จ๊ณ„์— ๋จธ๋ฌผ๋Ÿฌ ์žˆ์Šต๋‹ˆ๋‹ค. ์ƒˆ๋กœ์šด ์ปค์Šคํ…€ ๋ฐฑ์—”๋“œ ํ™˜๊ฒฝ์—์„œ
PyTorch ์–‘์žํ™” API๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์›ํ•œ๋‹ค๋ฉด ์˜ˆ์ œ ์ฝ”๋“œ์— ์ •์˜๋œ
API ์ฝ”๋“œ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ž์ฒด์ ์ธ BackendConfig๋ฅผ ์ •์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Further Reading
์ฐธ๊ณ ์ž๋ฃŒ
---------------

How BackendConfig is used in FX graph mode quantization:
FX ๊ทธ๋ž˜ํ”„ ๋ชจ๋“œ ์–‘์žํ™”์—์„œ BackendConfig๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฒ•:
https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/fx/README.md

Motivation and implementation details behind BackendConfig:
BackendConfig๊ฐ€ ๋งŒ๋“ค์–ด์ง„ ๋™๊ธฐ์™€ ๊ตฌํ˜„ ๋ฐฉ๋ฒ•
https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/backend_config/README.md

Early design of BackendConfig:
BackendConfig์˜ ์ดˆ๊ธฐ ์„ค๊ณ„:
https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md

0 comments on commit 2f5c892

Please sign in to comment.