Updated notebooks

Xilinx · Aug 20, 2024 · 8f26c84 · 8f26c84
1 parent 2c26cd1
commit 8f26c84
Showing 1 changed file with 87 additions and 2 deletions.
diff --git a/notebooks/minifloat_mx_tutorial.ipynb b/notebooks/minifloat_mx_tutorial.ipynb
@@ -4,14 +4,59 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Minifloat and MX Example"
+    "# Minifloat and Groupwise quantization"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Work in progress examples to show how to use minifloat and MX with Brevitas"
+    "This notebook shows some practical use cases for minifloat and groupwise quantization.\n",
+    "\n",
+    "Brevitas supports a wide combination of float quantization, including the OCP and FNUZ FP8 standard.\n",
+    "It is possible to define any combination of exponent/mantissa bitwidth, as well as exponent bias.\n",
+    "\n",
+    "Similarly, MX quantization is supported as general groupwise quantization on top of integer/minifloat datatype.\n",
+    "This allows to any general groupwise quantization, including MXInt and MXFloat standards.\n",
+    "\n",
+    "This tutorial shows how to instantiate and use some of the most interesting quantizers for minifloat and groupwise quantization"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Minifloat (FP8 and lower)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Brevitas offers some pre-defined quantizers for  minifloat quantization, including OCP and FNUZ standards, which can be further customized according to the specific use case.\n",
+    "The general naming structure for the quantizers is the following:\n",
+    "\n",
+    "`Fp\\<Bitwidth\\>\\<Standard\\>Weight\\<Scaling\\>Float`\n",
+    "\n",
+    "Where `Bitwidth` can be either empty or `e4m3`/`e5m2`, `Standard` can be empty or `OCP`/`FNUZ`, `Scaling` can be empty or `PerTensor`/`PerChannel`.\n",
+    "\n",
+    "If `Bitwidth` is empty, the user must set it with kwargs or by subclassing the quantizers. Once the bitwidth is defined, the correct values for inf/nan are automatically defined based on the `Standard`.\n",
+    "If a non-valid OCP bitwidth is set (e.g., e6m1), then no inf/nan values will be selected and the corresponding quantizer is not standard-compliant.\n",
+    "\n",
+    "`Standard` allows to pick among the two main FP8 standard options; moreover, Brevitas offers the possibility of doing minifloat quantization without necessarily reserving values for inf/nan representation.\n",
+    "This allows to use the maximum available range, since often in quantization, values that exceed the quantization range saturate to maximum rather than going to inf/nan.\n",
+    "To use this third class of minifloat quantizers, \n",
+    "FNUZ quantizers need to have `saturating=True`.\n",
+    "\n",
+    "The `Scaling` options defines whether the quantization is _scaled_ or _unscaled_.\n",
+    "In the unscaled case, the scale factor for quantization is fixed to one, otherwise it can be set using any of the methods that Brevitas includes (e.g., statistics, learned, etc.)\n",
+    "\n",
+    "\n",
+    "Please keep in mind that not all combinations of the above options might be pre-defined and this serves mostly as indications of what Brevitas supports.\n",
+    "It is possible, following the same structure of the available quantizers, to define new ones that fit your needs.\n",
+    "\n",
+    "\n",
+    "Similar considerations can be extended for activation quantization."
    ]
   },
   {
@@ -63,6 +108,46 @@
     "assert isinstance(intermediate_input, FloatQuantTensor)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Groupwise quantization (MXInt/MXFloat)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Groupwise quantization is built on top of integer/minifloat quantization, with special considerations to accomodate for the groupwise scaling.\n",
+    "\n",
+    "Compared to Int/Float QuantTensor, the main difference of their groupwise equivalent is that value, scale, and zero_point are not direct attributes anymore but properties. The new attributes are value_, scale_, and zero_point_.\n",
+    "\n",
+    "The reason for this is shaping. When quantizing a tensor with shapes [O, I], where O is output channel and I is input channel, with groupsize k, groupwise quantization is normally represented as follow:\n",
+    "\n",
+    "- Tensor with shapes [O, k, I/k]\n",
+    "- Scales with shapes [O, k, 1]\n",
+    "- Zero point same as scale\n",
+    "\n",
+    "The alternative to this representation is to have all three tensors with shapes [O,I], with a massive increase in memory utilization, especially with QAT + gradients.\n",
+    "\n",
+    "The underscored attributes will have the compressed shapes, while the properties (non-underscored naming) will dynamically compute the expanded version of the property. This means:\n",
+    "```python\n",
+    "quant_tensor.scale_.shape\n",
+    "# This will print [O, k, 1]\n",
+    "quant_tensor.scale.shape\n",
+    "# This will print [O, I]\n",
+    "```\n",
+    "\n",
+    "With respect to pre-defined quantizers, Brevitas offers several Groupwise and MX options.\n",
+    "The meain difference between the two is that MX is restricted to group_size=32 and the scale factor must be a power-of-2.\n",
+    "The user can override these settings but the corresponding output won't be MX compliant.\n",
+    "\n",
+    "Another difference is that MXFloat relies on the OCP format as underlying data type, while generic groupwise float relies on the non-standard minifloat representation explained above.\n",
+    "\n",
+    "Finally, the general groupwise scaling relies on float scales."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 2,