Add float8_e4m3 and float8_e3m4 types support #23585

apivovarov · 2024-09-12T01:30:47Z

Description

Amazon has proposed two new FP8 types, Float8E4M3 and Float8E3M4. These types are implemented in commercially available hardware Amazon EC2 Trn1 Instances, and added to MLIR builtin types, LLVM APFloat, ml_dtypes, StableHLO.

XLA has Float8E4M3 and Float8E3M4 implementation in Review. See PR links in Related PRs section below.

This PR adds f8E4M3 and f8E3M4 types support to JAX.

`f8E4M3` type follows IEEE 754 convention.

f8E4M3 (IEEE 754)
- Exponent bias: 7
- Maximum stored exponent value: 14 (binary 1110)
- Maximum unbiased exponent value: 14 - 7 = 7
- Minimum stored exponent value: 1 (binary 0001)
- Minimum unbiased exponent value: 1 − 7 = −6
- Precision specifies the total number of bits used for the significand (mantisa), 
    including implicit leading integer bit = 3 + 1 = 4
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs

Additional details:
- Max exp (unbiased): 7
- Min exp (unbiased): -6
- Infinities (+/-): S.1111.000
- Zeros (+/-): S.0000.000
- NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111}
- Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240
- Min normal number: S.0001.000 = +/-2^(-6)
- Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7
- Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9)

`f8E3M4` type follows IEEE 754 convention

f8E3M4 (IEEE 754)
- Exponent bias: 3
- Maximum stored exponent value: 6 (binary 110)
- Maximum unbiased exponent value: 6 - 3 = 3
- Minimum stored exponent value: 1 (binary 001)
- Minimum unbiased exponent value: 1 − 3 = −2
- Precision specifies the total number of bits used for the significand (mantissa), 
    including implicit leading integer bit = 4 + 1 = 5
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs

Additional details:
- Max exp (unbiased): 3
- Min exp (unbiased): -2
- Infinities (+/-): S.111.0000
- Zeros (+/-): S.000.0000
- NaNs: S.111.{0,1}⁴ except S.111.0000
- Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5
- Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2)
- Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6)
- Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 =  +/-2^(-2) x 2^(-4) = +/-2^(-6)

Related PRs:

LLVM PR-97179 [APFloat] Add support for f8E4M3 IEEE 754 type (Merged)
LLVM PR-97118 [MLIR] Add f8E4M3 IEEE 754 type (Merged)
LLVM PR-99698 [APFloat] Add support for f8E3M4 IEEE 754 type (Merged)
LLVM PR-101230 [MLIR] Add f8E3M4 IEEE 754 type (Merged)
StableHLO PR-2486 [RFC] Add f8E4M3 and f8E3M4 types support (Merged)
StableHLO PR-2482 Add f8E4M3 and f8E3M4 types support (Merged)
ml_dtypes PR-161 Add float8_e4m3 (Merged)
ml_dtypes PR-171 Add float8_e3m4 (Merged)
XLA PR-17075 [TSL] Bump ml_dtypes. Add float8_e4m3, float8_e3m4 (Merged)
XLA PR-16585 Add support for float8_e4m3 and float8_e3m4 types (in Review)

How to build/install

This PR requires ml_dtype version 20240821 or later.

The current version on PyPI is 0.4.0, released on April 1, 2024, which is outdated. Therefore, ml_dtypes should be installed from source.

Related issue: jax-ml/ml_dtypes#185 [Question] Can we release a new version of ml_dtypes?

## Install the latest ml_dtypes
cd ml_dtypes
pip3 install .

## Install jaxlib and JAX
cd jax

### install jaxlib
python3 build/build.py
pip3 install dist/*.whl

### install jax
pip3 install .

Smoke test

import jax
import jax.numpy as jnp
from jax import Array, random
from jax._src.lib.mlir.dialects import hlo

jax.devices()

hlo.get_version_from_compatibility_requirement(
  hlo.StablehloCompatibilityRequirement.WEEK_4
)
hlo.get_version_from_compatibility_requirement(
  hlo.StablehloCompatibilityRequirement.WEEK_12
)

dtype = "float8_e4m3"
# dtype = "float8_e3m4"
key1 = random.PRNGKey(41)
key2 = random.PRNGKey(42)
a = random.uniform(key1, shape=(16, 16), dtype=dtype)
b = random.uniform(key2, shape=(16, 16), dtype=dtype)


def foo(a, b):
  return a @ b

foo_jit = jax.jit(foo)

# StableHLO
print(foo_jit.lower(a, b).as_text())

# HLO
print(foo_jit.lower(a, b).compile().as_text())

c = foo(a, b).block_until_ready()
c2 = foo_jit(a, b).block_until_ready()


i = 0
while i < 10000:
  c2 = foo_jit(a, b).block_until_ready()
  i += 1

Array([[3.25, 2.75, 2.5, 2.5, 2.25, 2.25, 2.5, 3.25, 2.5, 3.25, 2, 2.25,
        2.5, 3, 2.75, 2.75],
...
       [4, 3.5, 3.5, 2.25, 2, 2.25, 3, 3.25, 2.25, 3, 2.75, 3, 2.5, 3.25,
        2, 2.75]], dtype=float8_e4m3)

jakevdp · 2024-09-12T02:47:06Z

Thanks for the contribution! I don't think we'll be able to bump our ml_dtypes requirement any time soon, so if we want to merge this we'll have to make it robust to older ml_dtypes versions (the reason is that tensorflow pins a specific ml_dtypes version, and some workflows depend on installing both JAX and tensorflow.

The good news is this is easy enough to do with a few version guards: if you look at the initial implementation of float8 types in JAX, you can see the pattern we used previously.

jakevdp · 2024-09-12T13:55:59Z

Here's an example of how this was handled in the past: https://github.com/google/jax/blob/jax-v0.4.12/jax/_src/dtypes.py#L71

Basically, we only define the dtype in JAX if it's defined in ml_dtypes.

Another strategy we could use is the module-level __getattr__ for these types, so that if the ml_dtypes version is too old, we raise an error that specifies what version is required.

hawkinsp · 2024-09-12T14:22:23Z

Incidentally, the current TF pin is : Requires-Dist: ml-dtypes <0.5.0,>=0.3.1.

If we release ml_dtypes as 0.4.1 instead of 0.5.0 we probably could bump the minimum version.

I suspect we could ease this process if we committed to semver for ml_dtypes so TF felt like they could be less conservative in their pins. (Adding dtypes is hopefully safe!)

hawkinsp · 2024-09-12T14:23:12Z

That said I'd probably do it the way Jake said for now and then we can think about the minimum version bump separately, there may be other factors I haven't considered (e.g., users being stuck on an older TF for whatever reason).

apivovarov · 2024-09-26T04:51:41Z

That said I'd probably do it the way Jake said for now and then we can think about the minimum version bump separately, there may be other factors I haven't considered (e.g., users being stuck on an older TF for whatever reason).

I updated the PR and tested it with ml_dtypes 0.4.0 and 0.5.0
@jakevdp @hawkinsp

third_party/xla/workspace.bzl

jax/_src/dtypes.py

jax/_src/export/serialization.py

jax/_src/interpreters/mlir.py

jax/_src/lax/lax.py

jax/_src/public_test_util.py

jax/numpy/__init__.pyi

tests/dtypes_test.py

jakevdp

Looks good – at some point we should refactor things so that we keep a single source of truth for the list of custom float types, rather than re-defining it a half dozen times across the package. But that can be for another PR.

jakevdp · 2024-09-27T16:24:47Z

Please fix the lint issues – thanks! Also, the test failures look real. It seems that there's some place where the new float8 types must be registered

apivovarov · 2024-09-27T23:13:50Z

Please fix the lint issues – thanks! Also, the test failures look real. It seems that there's some place where the new float8 types must be registered

MyPy

fixed mypy issues

btw, Contributing to JAX explains how to run lint/ruff/mypy/jupytext locally

pre-commit run --all-files

All passed now

Regarding failed tests

FAILED tests/export_test.py::JaxExportTest::test_poly_numeric_dtypes_dtype_float8_e3m4

e4m3 and e3m4 were added to stablehlo 1.7.0 (3 weeks ago, Sep 4, 2024)

jax/_src/export/_export.py uses

target_version = hlo.get_version_from_compatibility_requirement(
      hlo.StablehloCompatibilityRequirement.WEEK_4)

it returns target_version 1.5.0

workaround - use NONE instead of WEEK_4 - it returns 1.7.5

FAILED tests/array_test.py::JaxArrayTest::test_shards_have_correct_dtype17

FAILED tests/dtypes_test.py::TestPromotionTables::testFloat8PromotionError

The tests work fine if I use XLA COMMIT_ID from XLA PR openxla/xla#16585 Add support for float8_e4m3 and float8_e3m4 types (in Review)

I guess we need to put this PR on hold and rerun the tests once XLA PR-16585 is merged to XLA main and XLA_COMMIT is updated in JAX. StableHLO WEEK_4 issue should resolve itself in 1-2 weeks too.

Test report on CPU

pytest -n auto tests/

26921 passed, 12530 skipped in 492.32s (0:08:12)

Imported from GitHub PR openxla/xla#16585 This PR adds f8E4M3 and f8E3M4 types support to XLA (mainly to cpu_compiler). ### `f8E4M3` type follows IEEE 754 convention. ```c f8E4M3 (IEEE 754) - Exponent bias: 7 - Maximum stored exponent value: 14 (binary 1110) - Maximum unbiased exponent value: 14 - 7 = 7 - Minimum stored exponent value: 1 (binary 0001) - Minimum unbiased exponent value: 1 − 7 = −6 - Precision specifies the total number of bits used for the significand (mantisa), including implicit leading integer bit = 3 + 1 = 4 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 7 - Min exp (unbiased): -6 - Infinities (+/-): S.1111.000 - Zeros (+/-): S.0000.000 - NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111} - Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240 - Min normal number: S.0001.000 = +/-2^(-6) - Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7 - Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9) ``` ### `f8E3M4` type follows IEEE 754 convention ```c f8E3M4 (IEEE 754) - Exponent bias: 3 - Maximum stored exponent value: 6 (binary 110) - Maximum unbiased exponent value: 6 - 3 = 3 - Minimum stored exponent value: 1 (binary 001) - Minimum unbiased exponent value: 1 − 3 = −2 - Precision specifies the total number of bits used for the significand (mantissa), including implicit leading integer bit = 4 + 1 = 5 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 3 - Min exp (unbiased): -2 - Infinities (+/-): S.111.0000 - Zeros (+/-): S.000.0000 - NaNs: S.111.{0,1}⁴ except S.111.0000 - Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5 - Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2) - Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6) - Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 = +/-2^(-2) x 2^(-4) = +/-2^(-6) ``` ### Testing: ``` bazel test \ //xla:array2d_test \ //xla:fp_util_test \ //xla:literal_comparison_test \ //xla:literal_test \ //xla/mlir/utils:type_util_test \ //xla:primitive_util_test \ //xla/python/ifrt:dtype_test \ //xla/python:xla_client_test \ //xla/service:elemental_ir_emitter_test \ //xla/service:float_normalization_test \ //xla/service/gpu/tests:float_conversions_test \ //xla/tests:array_elementwise_ops_test \ //xla/tests:constants_test \ //xla/tests:convert_test \ //xla/tests:float8_test \ //xla:util_test bazel test \ //xla/hlo/translate/hlo_to_mhlo/tests:import.hlo.test \ //xla/hlo/translate/mhlo_to_hlo/tests:export.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/hlo-legalize-to-stablehlo.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/ops.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/stablehlo-legalize-to-hlo.mlir.test ``` ### Related PRs: - LLVM [PR-97179](llvm/llvm-project#97179) [APFloat] Add support for f8E4M3 IEEE 754 type (Merged) - LLVM [PR-97118](llvm/llvm-project#97118) [MLIR] Add f8E4M3 IEEE 754 type (Merged) - LLVM [PR-99698](llvm/llvm-project#99698) [APFloat] Add support for f8E3M4 IEEE 754 type (Merged) - LLVM [PR-101230](llvm/llvm-project#101230) [MLIR] Add f8E3M4 IEEE 754 type (Merged) - StableHLO [PR-2486](openxla/stablehlo#2486) [RFC] Add f8E4M3 and f8E3M4 types support (Merged) - StableHLO [PR-2482](openxla/stablehlo#2482) Add f8E4M3 and f8E3M4 types support (Merged) - ml_dtypes [PR-161](jax-ml/ml_dtypes#161) Add float8_e4m3 (Merged) - ml_dtypes [PR-171](jax-ml/ml_dtypes#171) Add float8_e3m4 (Merged) - XLA [PR-17075](openxla/xla#17075) [TSL] Bump ml_dtypes. Add float8_e4m3, float8_e3m4 (Approved) - XLA [PR-3200](openxla/xla#3200) Add support for float8_e4m3fnuz and float8_e5m2fnuz (Template) - JAX [PR-23585](jax-ml/jax#23585) Add float8_e4m3 type support (in Review) Copybara import of the project: -- ec1c723027012a816d7e17f268c5f034863696e6 by Alexander Pivovarov <[email protected]>: Add support for float8_e4m3 and float8_e3m4 types Merging this change closes #16585 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16585 from apivovarov:float8_e4m3 ec1c723027012a816d7e17f268c5f034863696e6 PiperOrigin-RevId: 680651037

Imported from GitHub PR #16585 This PR adds f8E4M3 and f8E3M4 types support to XLA (mainly to cpu_compiler). ### `f8E4M3` type follows IEEE 754 convention. ```c f8E4M3 (IEEE 754) - Exponent bias: 7 - Maximum stored exponent value: 14 (binary 1110) - Maximum unbiased exponent value: 14 - 7 = 7 - Minimum stored exponent value: 1 (binary 0001) - Minimum unbiased exponent value: 1 − 7 = −6 - Precision specifies the total number of bits used for the significand (mantisa), including implicit leading integer bit = 3 + 1 = 4 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 7 - Min exp (unbiased): -6 - Infinities (+/-): S.1111.000 - Zeros (+/-): S.0000.000 - NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111} - Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240 - Min normal number: S.0001.000 = +/-2^(-6) - Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7 - Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9) ``` ### `f8E3M4` type follows IEEE 754 convention ```c f8E3M4 (IEEE 754) - Exponent bias: 3 - Maximum stored exponent value: 6 (binary 110) - Maximum unbiased exponent value: 6 - 3 = 3 - Minimum stored exponent value: 1 (binary 001) - Minimum unbiased exponent value: 1 − 3 = −2 - Precision specifies the total number of bits used for the significand (mantissa), including implicit leading integer bit = 4 + 1 = 5 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 3 - Min exp (unbiased): -2 - Infinities (+/-): S.111.0000 - Zeros (+/-): S.000.0000 - NaNs: S.111.{0,1}⁴ except S.111.0000 - Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5 - Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2) - Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6) - Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 = +/-2^(-2) x 2^(-4) = +/-2^(-6) ``` ### Testing: ``` bazel test \ //xla:array2d_test \ //xla:fp_util_test \ //xla:literal_comparison_test \ //xla:literal_test \ //xla/mlir/utils:type_util_test \ //xla:primitive_util_test \ //xla/python/ifrt:dtype_test \ //xla/python:xla_client_test \ //xla/service:elemental_ir_emitter_test \ //xla/service:float_normalization_test \ //xla/service/gpu/tests:float_conversions_test \ //xla/tests:array_elementwise_ops_test \ //xla/tests:constants_test \ //xla/tests:convert_test \ //xla/tests:float8_test \ //xla:util_test bazel test \ //xla/hlo/translate/hlo_to_mhlo/tests:import.hlo.test \ //xla/hlo/translate/mhlo_to_hlo/tests:export.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/hlo-legalize-to-stablehlo.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/ops.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/stablehlo-legalize-to-hlo.mlir.test ``` ### Related PRs: - LLVM [PR-97179](llvm/llvm-project#97179) [APFloat] Add support for f8E4M3 IEEE 754 type (Merged) - LLVM [PR-97118](llvm/llvm-project#97118) [MLIR] Add f8E4M3 IEEE 754 type (Merged) - LLVM [PR-99698](llvm/llvm-project#99698) [APFloat] Add support for f8E3M4 IEEE 754 type (Merged) - LLVM [PR-101230](llvm/llvm-project#101230) [MLIR] Add f8E3M4 IEEE 754 type (Merged) - StableHLO [PR-2486](openxla/stablehlo#2486) [RFC] Add f8E4M3 and f8E3M4 types support (Merged) - StableHLO [PR-2482](openxla/stablehlo#2482) Add f8E4M3 and f8E3M4 types support (Merged) - ml_dtypes [PR-161](jax-ml/ml_dtypes#161) Add float8_e4m3 (Merged) - ml_dtypes [PR-171](jax-ml/ml_dtypes#171) Add float8_e3m4 (Merged) - XLA [PR-17075](#17075) [TSL] Bump ml_dtypes. Add float8_e4m3, float8_e3m4 (Approved) - XLA [PR-3200](#3200) Add support for float8_e4m3fnuz and float8_e5m2fnuz (Template) - JAX [PR-23585](jax-ml/jax#23585) Add float8_e4m3 type support (in Review) Copybara import of the project: -- ec1c723 by Alexander Pivovarov <[email protected]>: Add support for float8_e4m3 and float8_e3m4 types Merging this change closes #16585 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16585 from apivovarov:float8_e4m3 ec1c723 PiperOrigin-RevId: 680651037

Imported from GitHub PR openxla/xla#16585 This PR adds f8E4M3 and f8E3M4 types support to XLA (mainly to cpu_compiler). ### `f8E4M3` type follows IEEE 754 convention. ```c f8E4M3 (IEEE 754) - Exponent bias: 7 - Maximum stored exponent value: 14 (binary 1110) - Maximum unbiased exponent value: 14 - 7 = 7 - Minimum stored exponent value: 1 (binary 0001) - Minimum unbiased exponent value: 1 − 7 = −6 - Precision specifies the total number of bits used for the significand (mantisa), including implicit leading integer bit = 3 + 1 = 4 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 7 - Min exp (unbiased): -6 - Infinities (+/-): S.1111.000 - Zeros (+/-): S.0000.000 - NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111} - Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240 - Min normal number: S.0001.000 = +/-2^(-6) - Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7 - Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9) ``` ### `f8E3M4` type follows IEEE 754 convention ```c f8E3M4 (IEEE 754) - Exponent bias: 3 - Maximum stored exponent value: 6 (binary 110) - Maximum unbiased exponent value: 6 - 3 = 3 - Minimum stored exponent value: 1 (binary 001) - Minimum unbiased exponent value: 1 − 3 = −2 - Precision specifies the total number of bits used for the significand (mantissa), including implicit leading integer bit = 4 + 1 = 5 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 3 - Min exp (unbiased): -2 - Infinities (+/-): S.111.0000 - Zeros (+/-): S.000.0000 - NaNs: S.111.{0,1}⁴ except S.111.0000 - Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5 - Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2) - Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6) - Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 = +/-2^(-2) x 2^(-4) = +/-2^(-6) ``` ### Testing: ``` bazel test \ //xla:array2d_test \ //xla:fp_util_test \ //xla:literal_comparison_test \ //xla:literal_test \ //xla/mlir/utils:type_util_test \ //xla:primitive_util_test \ //xla/python/ifrt:dtype_test \ //xla/python:xla_client_test \ //xla/service:elemental_ir_emitter_test \ //xla/service:float_normalization_test \ //xla/service/gpu/tests:float_conversions_test \ //xla/tests:array_elementwise_ops_test \ //xla/tests:constants_test \ //xla/tests:convert_test \ //xla/tests:float8_test \ //xla:util_test bazel test \ //xla/hlo/translate/hlo_to_mhlo/tests:import.hlo.test \ //xla/hlo/translate/mhlo_to_hlo/tests:export.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/hlo-legalize-to-stablehlo.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/ops.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/stablehlo-legalize-to-hlo.mlir.test ``` ### Related PRs: - LLVM [PR-97179](llvm/llvm-project#97179) [APFloat] Add support for f8E4M3 IEEE 754 type (Merged) - LLVM [PR-97118](llvm/llvm-project#97118) [MLIR] Add f8E4M3 IEEE 754 type (Merged) - LLVM [PR-99698](llvm/llvm-project#99698) [APFloat] Add support for f8E3M4 IEEE 754 type (Merged) - LLVM [PR-101230](llvm/llvm-project#101230) [MLIR] Add f8E3M4 IEEE 754 type (Merged) - StableHLO [PR-2486](openxla/stablehlo#2486) [RFC] Add f8E4M3 and f8E3M4 types support (Merged) - StableHLO [PR-2482](openxla/stablehlo#2482) Add f8E4M3 and f8E3M4 types support (Merged) - ml_dtypes [PR-161](jax-ml/ml_dtypes#161) Add float8_e4m3 (Merged) - ml_dtypes [PR-171](jax-ml/ml_dtypes#171) Add float8_e3m4 (Merged) - XLA [PR-17075](openxla/xla#17075) [TSL] Bump ml_dtypes. Add float8_e4m3, float8_e3m4 (Approved) - XLA [PR-3200](openxla/xla#3200) Add support for float8_e4m3fnuz and float8_e5m2fnuz (Template) - JAX [PR-23585](jax-ml/jax#23585) Add float8_e4m3 type support (in Review) Copybara import of the project: -- ec1c723027012a816d7e17f268c5f034863696e6 by Alexander Pivovarov <[email protected]>: Add support for float8_e4m3 and float8_e3m4 types Merging this change closes #16585 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16585 from apivovarov:float8_e4m3 ec1c723027012a816d7e17f268c5f034863696e6 PiperOrigin-RevId: 680651037

Imported from GitHub PR #16585 This PR adds f8E4M3 and f8E3M4 types support to XLA (mainly to cpu_compiler). ### `f8E4M3` type follows IEEE 754 convention. ```c f8E4M3 (IEEE 754) - Exponent bias: 7 - Maximum stored exponent value: 14 (binary 1110) - Maximum unbiased exponent value: 14 - 7 = 7 - Minimum stored exponent value: 1 (binary 0001) - Minimum unbiased exponent value: 1 − 7 = −6 - Precision specifies the total number of bits used for the significand (mantisa), including implicit leading integer bit = 3 + 1 = 4 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 7 - Min exp (unbiased): -6 - Infinities (+/-): S.1111.000 - Zeros (+/-): S.0000.000 - NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111} - Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240 - Min normal number: S.0001.000 = +/-2^(-6) - Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7 - Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9) ``` ### `f8E3M4` type follows IEEE 754 convention ```c f8E3M4 (IEEE 754) - Exponent bias: 3 - Maximum stored exponent value: 6 (binary 110) - Maximum unbiased exponent value: 6 - 3 = 3 - Minimum stored exponent value: 1 (binary 001) - Minimum unbiased exponent value: 1 − 3 = −2 - Precision specifies the total number of bits used for the significand (mantissa), including implicit leading integer bit = 4 + 1 = 5 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 3 - Min exp (unbiased): -2 - Infinities (+/-): S.111.0000 - Zeros (+/-): S.000.0000 - NaNs: S.111.{0,1}⁴ except S.111.0000 - Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5 - Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2) - Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6) - Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 = +/-2^(-2) x 2^(-4) = +/-2^(-6) ``` ### Testing: ``` bazel test \ //xla:array2d_test \ //xla:fp_util_test \ //xla:literal_comparison_test \ //xla:literal_test \ //xla/mlir/utils:type_util_test \ //xla:primitive_util_test \ //xla/python/ifrt:dtype_test \ //xla/python:xla_client_test \ //xla/service:elemental_ir_emitter_test \ //xla/service:float_normalization_test \ //xla/service/gpu/tests:float_conversions_test \ //xla/tests:array_elementwise_ops_test \ //xla/tests:constants_test \ //xla/tests:convert_test \ //xla/tests:float8_test \ //xla:util_test bazel test \ //xla/hlo/translate/hlo_to_mhlo/tests:import.hlo.test \ //xla/hlo/translate/mhlo_to_hlo/tests:export.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/hlo-legalize-to-stablehlo.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/ops.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/stablehlo-legalize-to-hlo.mlir.test ``` ### Related PRs: - LLVM [PR-97179](llvm/llvm-project#97179) [APFloat] Add support for f8E4M3 IEEE 754 type (Merged) - LLVM [PR-97118](llvm/llvm-project#97118) [MLIR] Add f8E4M3 IEEE 754 type (Merged) - LLVM [PR-99698](llvm/llvm-project#99698) [APFloat] Add support for f8E3M4 IEEE 754 type (Merged) - LLVM [PR-101230](llvm/llvm-project#101230) [MLIR] Add f8E3M4 IEEE 754 type (Merged) - StableHLO [PR-2486](openxla/stablehlo#2486) [RFC] Add f8E4M3 and f8E3M4 types support (Merged) - StableHLO [PR-2482](openxla/stablehlo#2482) Add f8E4M3 and f8E3M4 types support (Merged) - ml_dtypes [PR-161](jax-ml/ml_dtypes#161) Add float8_e4m3 (Merged) - ml_dtypes [PR-171](jax-ml/ml_dtypes#171) Add float8_e3m4 (Merged) - XLA [PR-17075](#17075) [TSL] Bump ml_dtypes. Add float8_e4m3, float8_e3m4 (Approved) - XLA [PR-3200](#3200) Add support for float8_e4m3fnuz and float8_e5m2fnuz (Template) - JAX [PR-23585](jax-ml/jax#23585) Add float8_e4m3 type support (in Review) Copybara import of the project: -- ec1c723 by Alexander Pivovarov <[email protected]>: Add support for float8_e4m3 and float8_e3m4 types Merging this change closes #16585 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16585 from apivovarov:float8_e4m3 ec1c723 PiperOrigin-RevId: 680651037

Imported from GitHub PR openxla/xla#16585 This PR adds f8E4M3 and f8E3M4 types support to XLA (mainly to cpu_compiler). ### `f8E4M3` type follows IEEE 754 convention. ```c f8E4M3 (IEEE 754) - Exponent bias: 7 - Maximum stored exponent value: 14 (binary 1110) - Maximum unbiased exponent value: 14 - 7 = 7 - Minimum stored exponent value: 1 (binary 0001) - Minimum unbiased exponent value: 1 − 7 = −6 - Precision specifies the total number of bits used for the significand (mantisa), including implicit leading integer bit = 3 + 1 = 4 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 7 - Min exp (unbiased): -6 - Infinities (+/-): S.1111.000 - Zeros (+/-): S.0000.000 - NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111} - Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240 - Min normal number: S.0001.000 = +/-2^(-6) - Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7 - Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9) ``` ### `f8E3M4` type follows IEEE 754 convention ```c f8E3M4 (IEEE 754) - Exponent bias: 3 - Maximum stored exponent value: 6 (binary 110) - Maximum unbiased exponent value: 6 - 3 = 3 - Minimum stored exponent value: 1 (binary 001) - Minimum unbiased exponent value: 1 − 3 = −2 - Precision specifies the total number of bits used for the significand (mantissa), including implicit leading integer bit = 4 + 1 = 5 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 3 - Min exp (unbiased): -2 - Infinities (+/-): S.111.0000 - Zeros (+/-): S.000.0000 - NaNs: S.111.{0,1}⁴ except S.111.0000 - Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5 - Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2) - Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6) - Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 = +/-2^(-2) x 2^(-4) = +/-2^(-6) ``` ### Testing: ``` bazel test \ //xla:array2d_test \ //xla:fp_util_test \ //xla:literal_comparison_test \ //xla:literal_test \ //xla/mlir/utils:type_util_test \ //xla:primitive_util_test \ //xla/python/ifrt:dtype_test \ //xla/python:xla_client_test \ //xla/service:elemental_ir_emitter_test \ //xla/service:float_normalization_test \ //xla/service/gpu/tests:float_conversions_test \ //xla/tests:array_elementwise_ops_test \ //xla/tests:constants_test \ //xla/tests:convert_test \ //xla/tests:float8_test \ //xla:util_test bazel test \ //xla/hlo/translate/hlo_to_mhlo/tests:import.hlo.test \ //xla/hlo/translate/mhlo_to_hlo/tests:export.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/hlo-legalize-to-stablehlo.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/ops.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/stablehlo-legalize-to-hlo.mlir.test ``` ### Related PRs: - LLVM [PR-97179](llvm/llvm-project#97179) [APFloat] Add support for f8E4M3 IEEE 754 type (Merged) - LLVM [PR-97118](llvm/llvm-project#97118) [MLIR] Add f8E4M3 IEEE 754 type (Merged) - LLVM [PR-99698](llvm/llvm-project#99698) [APFloat] Add support for f8E3M4 IEEE 754 type (Merged) - LLVM [PR-101230](llvm/llvm-project#101230) [MLIR] Add f8E3M4 IEEE 754 type (Merged) - StableHLO [PR-2486](openxla/stablehlo#2486) [RFC] Add f8E4M3 and f8E3M4 types support (Merged) - StableHLO [PR-2482](openxla/stablehlo#2482) Add f8E4M3 and f8E3M4 types support (Merged) - ml_dtypes [PR-161](jax-ml/ml_dtypes#161) Add float8_e4m3 (Merged) - ml_dtypes [PR-171](jax-ml/ml_dtypes#171) Add float8_e3m4 (Merged) - XLA [PR-17075](openxla/xla#17075) [TSL] Bump ml_dtypes. Add float8_e4m3, float8_e3m4 (Approved) - XLA [PR-3200](openxla/xla#3200) Add support for float8_e4m3fnuz and float8_e5m2fnuz (Template) - JAX [PR-23585](jax-ml/jax#23585) Add float8_e4m3 type support (in Review) Copybara import of the project: -- ec1c723027012a816d7e17f268c5f034863696e6 by Alexander Pivovarov <[email protected]>: Add support for float8_e4m3 and float8_e3m4 types Merging this change closes #16585 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16585 from apivovarov:float8_e4m3 ec1c723027012a816d7e17f268c5f034863696e6 PiperOrigin-RevId: 680651037

Imported from GitHub PR #16585 This PR adds f8E4M3 and f8E3M4 types support to XLA (mainly to cpu_compiler). ### `f8E4M3` type follows IEEE 754 convention. ```c f8E4M3 (IEEE 754) - Exponent bias: 7 - Maximum stored exponent value: 14 (binary 1110) - Maximum unbiased exponent value: 14 - 7 = 7 - Minimum stored exponent value: 1 (binary 0001) - Minimum unbiased exponent value: 1 − 7 = −6 - Precision specifies the total number of bits used for the significand (mantisa), including implicit leading integer bit = 3 + 1 = 4 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 7 - Min exp (unbiased): -6 - Infinities (+/-): S.1111.000 - Zeros (+/-): S.0000.000 - NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111} - Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240 - Min normal number: S.0001.000 = +/-2^(-6) - Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7 - Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9) ``` ### `f8E3M4` type follows IEEE 754 convention ```c f8E3M4 (IEEE 754) - Exponent bias: 3 - Maximum stored exponent value: 6 (binary 110) - Maximum unbiased exponent value: 6 - 3 = 3 - Minimum stored exponent value: 1 (binary 001) - Minimum unbiased exponent value: 1 − 3 = −2 - Precision specifies the total number of bits used for the significand (mantissa), including implicit leading integer bit = 4 + 1 = 5 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 3 - Min exp (unbiased): -2 - Infinities (+/-): S.111.0000 - Zeros (+/-): S.000.0000 - NaNs: S.111.{0,1}⁴ except S.111.0000 - Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5 - Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2) - Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6) - Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 = +/-2^(-2) x 2^(-4) = +/-2^(-6) ``` ### Testing: ``` bazel test \ //xla:array2d_test \ //xla:fp_util_test \ //xla:literal_comparison_test \ //xla:literal_test \ //xla/mlir/utils:type_util_test \ //xla:primitive_util_test \ //xla/python/ifrt:dtype_test \ //xla/python:xla_client_test \ //xla/service:elemental_ir_emitter_test \ //xla/service:float_normalization_test \ //xla/service/gpu/tests:float_conversions_test \ //xla/tests:array_elementwise_ops_test \ //xla/tests:constants_test \ //xla/tests:convert_test \ //xla/tests:float8_test \ //xla:util_test bazel test \ //xla/hlo/translate/hlo_to_mhlo/tests:import.hlo.test \ //xla/hlo/translate/mhlo_to_hlo/tests:export.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/hlo-legalize-to-stablehlo.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/ops.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/stablehlo-legalize-to-hlo.mlir.test ``` ### Related PRs: - LLVM [PR-97179](llvm/llvm-project#97179) [APFloat] Add support for f8E4M3 IEEE 754 type (Merged) - LLVM [PR-97118](llvm/llvm-project#97118) [MLIR] Add f8E4M3 IEEE 754 type (Merged) - LLVM [PR-99698](llvm/llvm-project#99698) [APFloat] Add support for f8E3M4 IEEE 754 type (Merged) - LLVM [PR-101230](llvm/llvm-project#101230) [MLIR] Add f8E3M4 IEEE 754 type (Merged) - StableHLO [PR-2486](openxla/stablehlo#2486) [RFC] Add f8E4M3 and f8E3M4 types support (Merged) - StableHLO [PR-2482](openxla/stablehlo#2482) Add f8E4M3 and f8E3M4 types support (Merged) - ml_dtypes [PR-161](jax-ml/ml_dtypes#161) Add float8_e4m3 (Merged) - ml_dtypes [PR-171](jax-ml/ml_dtypes#171) Add float8_e3m4 (Merged) - XLA [PR-17075](#17075) [TSL] Bump ml_dtypes. Add float8_e4m3, float8_e3m4 (Approved) - XLA [PR-3200](#3200) Add support for float8_e4m3fnuz and float8_e5m2fnuz (Template) - JAX [PR-23585](jax-ml/jax#23585) Add float8_e4m3 type support (in Review) Copybara import of the project: -- ec1c723 by Alexander Pivovarov <[email protected]>: Add support for float8_e4m3 and float8_e3m4 types Merging this change closes #16585 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16585 from apivovarov:float8_e4m3 ec1c723 PiperOrigin-RevId: 680651037

Imported from GitHub PR openxla/xla#16585 This PR adds f8E4M3 and f8E3M4 types support to XLA (mainly to cpu_compiler). ### `f8E4M3` type follows IEEE 754 convention. ```c f8E4M3 (IEEE 754) - Exponent bias: 7 - Maximum stored exponent value: 14 (binary 1110) - Maximum unbiased exponent value: 14 - 7 = 7 - Minimum stored exponent value: 1 (binary 0001) - Minimum unbiased exponent value: 1 − 7 = −6 - Precision specifies the total number of bits used for the significand (mantisa), including implicit leading integer bit = 3 + 1 = 4 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 7 - Min exp (unbiased): -6 - Infinities (+/-): S.1111.000 - Zeros (+/-): S.0000.000 - NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111} - Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240 - Min normal number: S.0001.000 = +/-2^(-6) - Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7 - Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9) ``` ### `f8E3M4` type follows IEEE 754 convention ```c f8E3M4 (IEEE 754) - Exponent bias: 3 - Maximum stored exponent value: 6 (binary 110) - Maximum unbiased exponent value: 6 - 3 = 3 - Minimum stored exponent value: 1 (binary 001) - Minimum unbiased exponent value: 1 − 3 = −2 - Precision specifies the total number of bits used for the significand (mantissa), including implicit leading integer bit = 4 + 1 = 5 - Follows IEEE 754 conventions for representation of special values - Has Positive and Negative zero - Has Positive and Negative infinity - Has NaNs Additional details: - Max exp (unbiased): 3 - Min exp (unbiased): -2 - Infinities (+/-): S.111.0000 - Zeros (+/-): S.000.0000 - NaNs: S.111.{0,1}⁴ except S.111.0000 - Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5 - Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2) - Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6) - Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 = +/-2^(-2) x 2^(-4) = +/-2^(-6) ``` ### Testing: ``` bazel test \ //xla:array2d_test \ //xla:fp_util_test \ //xla:literal_comparison_test \ //xla:literal_test \ //xla/mlir/utils:type_util_test \ //xla:primitive_util_test \ //xla/python/ifrt:dtype_test \ //xla/python:xla_client_test \ //xla/service:elemental_ir_emitter_test \ //xla/service:float_normalization_test \ //xla/service/gpu/tests:float_conversions_test \ //xla/tests:array_elementwise_ops_test \ //xla/tests:constants_test \ //xla/tests:convert_test \ //xla/tests:float8_test \ //xla:util_test bazel test \ //xla/hlo/translate/hlo_to_mhlo/tests:import.hlo.test \ //xla/hlo/translate/mhlo_to_hlo/tests:export.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/hlo-legalize-to-stablehlo.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/ops.mlir.test \ //xla/mlir_hlo/tests:Dialect/mhlo/stablehlo-legalize-to-hlo.mlir.test ``` ### Related PRs: - LLVM [PR-97179](llvm/llvm-project#97179) [APFloat] Add support for f8E4M3 IEEE 754 type (Merged) - LLVM [PR-97118](llvm/llvm-project#97118) [MLIR] Add f8E4M3 IEEE 754 type (Merged) - LLVM [PR-99698](llvm/llvm-project#99698) [APFloat] Add support for f8E3M4 IEEE 754 type (Merged) - LLVM [PR-101230](llvm/llvm-project#101230) [MLIR] Add f8E3M4 IEEE 754 type (Merged) - StableHLO [PR-2486](openxla/stablehlo#2486) [RFC] Add f8E4M3 and f8E3M4 types support (Merged) - StableHLO [PR-2482](openxla/stablehlo#2482) Add f8E4M3 and f8E3M4 types support (Merged) - ml_dtypes [PR-161](jax-ml/ml_dtypes#161) Add float8_e4m3 (Merged) - ml_dtypes [PR-171](jax-ml/ml_dtypes#171) Add float8_e3m4 (Merged) - XLA [PR-17075](openxla/xla#17075) [TSL] Bump ml_dtypes. Add float8_e4m3, float8_e3m4 (Approved) - XLA [PR-3200](openxla/xla#3200) Add support for float8_e4m3fnuz and float8_e5m2fnuz (Template) - JAX [PR-23585](jax-ml/jax#23585) Add float8_e4m3 type support (in Review) Copybara import of the project: -- ec1c723027012a816d7e17f268c5f034863696e6 by Alexander Pivovarov <[email protected]>: Add support for float8_e4m3 and float8_e3m4 types Merging this change closes #16585 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16585 from apivovarov:float8_e4m3 ec1c723027012a816d7e17f268c5f034863696e6 PiperOrigin-RevId: 680651037

jakevdp · 2024-10-08T20:58:40Z

We're seeing some internal failures on GPU and TPU backends. I'll try to debug.

apivovarov · 2024-10-08T21:22:47Z

Will try to build/test on GPU instance

jakevdp · 2024-10-08T21:37:52Z

The error is this:

APITest.test_jit_custom_floats_float8_e4m3:
...
"/build/.../jax/_src/array.py", [line 624](jax/_src/array.py?l=624), in _value
    self._npy_value = self._single_device_array_to_np_array()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
xla.python.xla_extension.XlaRuntimeError: INVALID_ARGUMENT: Unsupported type in PrimitiveTypeToDataType 28

Something to do with one of the lower-level passes not knowing how to consume the new serialized types

apivovarov · 2024-10-09T21:45:45Z

I tested the PR on GPU instance (nvidia A10G 23GB) and found that dtypes_test.py, export_test.py and api_test.py failed for f8e3m4 and f8e4m3 types. Error: Failed to serialize StableHLO.

The issue occurs because PjRtCApiCompiler::Compile() calls xla::GetDefaultStablehloVersion(), which uses WEEK_12 and returns StableHLO version 1.1.0. However, these new types require version 1.7.0.

When I replace WEEK_12 with WEEK_4, the tests pass successfully on the GPU.

Temporary I limited f8e3m4 and f8e4m3 tests to run on "cpu" only.

    # TODO: Remove "cpu" check once xla::GetDefaultStablehloVersion() is 1.7.0+
    if device_under_test() == "cpu" and jax._src.lib.version >= (0, 4, 35):
        ...

I also opened an XLA PR to review the possibility to use WEEK_4 in xla::GetDefaultStablehloVersion() - openxla/xla#18117

hawkinsp · 2024-10-09T21:48:36Z

I tested the PR on GPU instance (nvidia A10G 23GB) and found that dtypes_test.py, export_test.py and api_test.py failed for f8e3m4 and f8e4m3 types. Error: Failed to serialize StableHLO.

The issue occurs because PjRtCApiCompiler::Compile() calls xla::GetDefaultStablehloVersion(), which uses WEEK_12 and returns StableHLO version 1.1.0. However, these new types require version 1.7.0.

Yeah, we're aware of this. We need to plumb the plugin's stablehlo version to JAX, which we haven't done yet. We should always be able to produce the newest stablehlo the consumer can consume, but currently we are overly conservative, I think. @dfm

apivovarov · 2024-10-09T21:56:19Z

The message "Unsupported type in PrimitiveTypeToDataType" from the error report above does not appear to exist in XLA, ml_dtypes, JAX, or StableHLO. I found it only in TensorFlow, specifically in the file tensorflow/compiler/tf2xla/type_util.cc.

It is likely that the same code is utilized in some closed-source PJRT plugins/compilers.

device_under_test() == "cpu" should fix "Unsupported type in PrimitiveTypeToDataType" case in APITest.test_jit_custom_floats_float8_e4m3 too.

The tests work fine on GPU if WEEK_4 is used in xla::GetDefaultStablehloVersion()

=== jtu.device_under_test(): gpu

tests/api_test.py::APITest::test_jit_custom_floats_float8_e3m4 PASSED
tests/api_test.py::APITest::test_jit_custom_floats_float8_e4m3 PASSED

tests/dtypes_test.py::TestPromotionTables::testJaxTypeFromType_jaxtype=dtype(float8_e3m4) PASSED
tests/dtypes_test.py::TestPromotionTables::testJaxTypeFromType_jaxtype=dtype(float8_e4m3) PASSED
tests/dtypes_test.py::TestPromotionTables::testJaxTypeFromVal_jaxtype=dtype(float8_e3m4) PASSED
tests/dtypes_test.py::TestPromotionTables::testJaxTypeFromVal_jaxtype=dtype(float8_e4m3) PASSED
tests/dtypes_test.py::TestPromotionTables::testJaxTypeWeak_dtype=dtype(float8_e3m4) PASSED
tests/dtypes_test.py::TestPromotionTables::testJaxTypeWeak_dtype=dtype(float8_e4m3) PASSED

tests/export_test.py::JaxExportTest::test_poly_numeric_dtypes_dtype_float8_e3m4 PASSED
tests/export_test.py::JaxExportTest::test_poly_numeric_dtypes_dtype_float8_e4m3 PASSED

apivovarov · 2024-10-10T16:33:13Z

Jake, can you help with re-running the tests. I temporary limited f8e3m4 and f8e4m3 tests to run on "cpu" only.
@jakevdp
The tests work fine on GPU if WEEK_4 is used in xla::GetDefaultStablehloVersion()

apivovarov · 2024-10-11T20:50:58Z

Summary of PR Testing on GPU:

Installed packages:

jax-0.4.35.dev
jaxlib-0.4.35.dev
ml_dtypes-0.5.0
get_version_from_compatibility_requirement(WEEK_12) => StableHLO-1.1.0

Results:

The smoke test script (provided in the PR description) that uses the dot operation successfully passed on the GPU, including tests using jax.jit
All pytest tests passed on the GPU instance.*

*Note: Initial test failures occurred for f8e3m4 and f8e4m3 on the GPU due to the use of StableHLO version 1.1.0 (WEEK_12).
Temporary workaround: These tests are currently restricted to run on "CPU" only. I plan to re-enable GPU testing for these types after November 28, 2024, when WEEK_12 will be updated to use StableHLO version 1.7.0.

Jake, Peter, Dan,
Could you advise on the next steps for this PR?

@jakevdp @hawkinsp @dfm

jakevdp · 2024-10-14T14:52:46Z

We're still seeing some new failures in the PJRT runtime – I'm not sure how to address those. @hawkinsp do you have thoughts on how to proceed here?

apivovarov · 2024-10-24T19:47:01Z

StableHLO WEEK_12 is in the process of updating to version 1.5.0 (openxla/stablehlo#2599)

WEEK_12 is scheduled to switch to version 1.7.0 (where f8e4m3 was added) in about 36 days (on or after November 28, 2024).

Are there any potential workarounds for the "failed internal tests for GPU and TPU backends" (reported here) that could allow this PR to be shipped earlier?

@hawkinsp @jakevdp @dfm

apivovarov · 2024-11-08T19:16:29Z

I retested this PR using jaxlib 0.4.35 (released Oct 22) on a server with 4 GPU devices.

The checks and workarounds I previously added to skip certain tests for f8e4m3/f8e3m4 types are no longer needed, so I removed them and rebased this PR.

All tests passed on the 4-GPU setup.

Jake, Peter, do you think we should attempt merging this PR again now that jaxlib 0.4.35 was released two weeks ago and all tests passed on GPU without any workarounds or skips?
@jakevdp @hawkinsp

The other methods of `_LazyDtypes` filter by the supported dtypes, so it's strange that this property does not. Change in preparation for landing #23585 without breaking existing tests. PiperOrigin-RevId: 697727110

hawkinsp · 2024-11-18T21:22:12Z

I think #24956 will unblock merging this.

The problem is that we shouldn't be running for every custom dtype in the tests, if any given backend only supports a subset of them.

The other methods of `_LazyDtypes` filter by the supported dtypes, so it's strange that this property does not. Change in preparation for landing #23585 without breaking existing tests. PiperOrigin-RevId: 697727110

The other methods of `_LazyDtypes` filter by the supported dtypes, so it's strange that this property does not. Change in preparation for landing #23585 without breaking existing tests. PiperOrigin-RevId: 697752034

This was referenced Sep 12, 2024

Add support for float8_e4m3 and float8_e3m4 types openxla/xla#16585

Closed

[Question] Can we release a new version of ml_dtypes? jax-ml/ml_dtypes#185

Closed

apivovarov force-pushed the float8_e4m3 branch from 0336705 to 5553be7 Compare September 26, 2024 04:22

apivovarov changed the title ~~Add float8_e4m3 type support~~ Add float8_e4m3 and float8_e3m4 types support Sep 26, 2024

apivovarov force-pushed the float8_e4m3 branch from 5553be7 to 090ff3e Compare September 26, 2024 04:49

jakevdp requested changes Sep 26, 2024

View reviewed changes

jakevdp self-assigned this Sep 26, 2024

apivovarov force-pushed the float8_e4m3 branch 3 times, most recently from a6218f2 to 0b862ed Compare September 27, 2024 03:41

jakevdp approved these changes Sep 27, 2024

View reviewed changes

apivovarov force-pushed the float8_e4m3 branch from 0b862ed to be89ca7 Compare September 27, 2024 21:42

copybara-service bot mentioned this pull request Sep 30, 2024

PR #16585: Add support for float8_e4m3 and float8_e3m4 types google/tsl#2762

Merged

copybara-service bot mentioned this pull request Sep 30, 2024

PR #16585: Add support for float8_e4m3 and float8_e3m4 types openxla/xla#17774

Merged

copybara-service bot mentioned this pull request Sep 30, 2024

PR #16585: Add support for float8_e4m3 and float8_e3m4 types tensorflow/tensorflow#76821

Merged

jakevdp added the kokoro:force-run label Oct 8, 2024

kokoro-team removed the kokoro:force-run label Oct 8, 2024

apivovarov mentioned this pull request Oct 9, 2024

Use StableHLO WEEK_4 in GetDefaultStablehloVersion openxla/xla#18117

Closed

apivovarov force-pushed the float8_e4m3 branch from 7937051 to f611112 Compare October 9, 2024 21:36

apivovarov force-pushed the float8_e4m3 branch from f611112 to c95f301 Compare October 10, 2024 16:32

apivovarov force-pushed the float8_e4m3 branch from c95f301 to 1136024 Compare October 11, 2024 19:42

jakevdp approved these changes Oct 14, 2024

View reviewed changes

apivovarov mentioned this pull request Oct 24, 2024

[fp8] Support fp8e4m3 in torch_xla pytorch/xla#8005

Open

Add float8_e4m3 and float8_e3m4 types support

78da9fa

apivovarov force-pushed the float8_e4m3 branch from 1136024 to 78da9fa Compare November 8, 2024 19:04

hawkinsp added pull ready Ready for copybara import and testing and removed pull ready Ready for copybara import and testing labels Nov 18, 2024

copybara-service bot mentioned this pull request Nov 18, 2024

Filter custom dtypes by supported_dtypes in _LazyDtypes. #24956

Merged

hawkinsp self-assigned this Nov 18, 2024

copybara-service bot merged commit 91891cb into jax-ml:main Nov 18, 2024
12 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add float8_e4m3 and float8_e3m4 types support #23585

Add float8_e4m3 and float8_e3m4 types support #23585

apivovarov commented Sep 12, 2024 •

edited

Loading

jakevdp commented Sep 12, 2024

jakevdp commented Sep 12, 2024

hawkinsp commented Sep 12, 2024

hawkinsp commented Sep 12, 2024

apivovarov commented Sep 26, 2024

jakevdp left a comment

jakevdp commented Sep 27, 2024 •

edited

Loading

apivovarov commented Sep 27, 2024 •

edited

Loading

jakevdp commented Oct 8, 2024

apivovarov commented Oct 8, 2024

jakevdp commented Oct 8, 2024 •

edited

Loading

apivovarov commented Oct 9, 2024

hawkinsp commented Oct 9, 2024

apivovarov commented Oct 9, 2024 •

edited

Loading

apivovarov commented Oct 10, 2024 •

edited

Loading

apivovarov commented Oct 11, 2024 •

edited

Loading

jakevdp commented Oct 14, 2024

apivovarov commented Oct 24, 2024

apivovarov commented Nov 8, 2024

hawkinsp commented Nov 18, 2024

Add float8_e4m3 and float8_e3m4 types support #23585

Add float8_e4m3 and float8_e3m4 types support #23585

Conversation

apivovarov commented Sep 12, 2024 • edited Loading

Description

f8E4M3 type follows IEEE 754 convention.

f8E3M4 type follows IEEE 754 convention

Related PRs:

How to build/install

Smoke test

jakevdp commented Sep 12, 2024

jakevdp commented Sep 12, 2024

hawkinsp commented Sep 12, 2024

hawkinsp commented Sep 12, 2024

apivovarov commented Sep 26, 2024

jakevdp left a comment

Choose a reason for hiding this comment

jakevdp commented Sep 27, 2024 • edited Loading

apivovarov commented Sep 27, 2024 • edited Loading

MyPy

Regarding failed tests

FAILED tests/export_test.py::JaxExportTest::test_poly_numeric_dtypes_dtype_float8_e3m4

FAILED tests/array_test.py::JaxArrayTest::test_shards_have_correct_dtype17

FAILED tests/dtypes_test.py::TestPromotionTables::testFloat8PromotionError

jakevdp commented Oct 8, 2024

apivovarov commented Oct 8, 2024

jakevdp commented Oct 8, 2024 • edited Loading

apivovarov commented Oct 9, 2024

hawkinsp commented Oct 9, 2024

apivovarov commented Oct 9, 2024 • edited Loading

apivovarov commented Oct 10, 2024 • edited Loading

apivovarov commented Oct 11, 2024 • edited Loading

Summary of PR Testing on GPU:

Installed packages:

jakevdp commented Oct 14, 2024

apivovarov commented Oct 24, 2024

apivovarov commented Nov 8, 2024

hawkinsp commented Nov 18, 2024

apivovarov commented Sep 12, 2024 •

edited

Loading

`f8E4M3` type follows IEEE 754 convention.

`f8E3M4` type follows IEEE 754 convention

jakevdp commented Sep 27, 2024 •

edited

Loading

apivovarov commented Sep 27, 2024 •

edited

Loading

jakevdp commented Oct 8, 2024 •

edited

Loading

apivovarov commented Oct 9, 2024 •

edited

Loading

apivovarov commented Oct 10, 2024 •

edited

Loading

apivovarov commented Oct 11, 2024 •

edited

Loading