Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Change float8_e5m2 dtype descr from <f1 to <V1 #216

Open
apivovarov opened this issue Oct 1, 2024 · 1 comment
Open

[RFC] Change float8_e5m2 dtype descr from <f1 to <V1 #216

apivovarov opened this issue Oct 1, 2024 · 1 comment
Assignees

Comments

@apivovarov
Copy link
Contributor

All data types defined in ml_dtypes/_src/dtypes.cc are assigned kNpyDescrKind = 'V', except for float8_e5m2.

Why is this an issue?

This discrepancy affects the saving and loading of arrays using .npy or .npz file formats. Specifically, numpy.load() can successfully load an .npy file if its header contains 'descr': '<V1'. However, it fails when the header contains 'descr': '<f1', resulting in the following error:

  File "/home/user/.local/lib/python3.9/site-packages/numpy/lib/format.py", line 655, in _read_array_header
    raise ValueError(msg.format(d['descr'])) from e
ValueError: descr is not a valid dtype descriptor: '<f1'

If an array serialization solution relies on the numpy.save() / numpy.load() APIs, it will experience inconsistent behavior for types defined in ml_dtypes. All ml_dtypes types with kind 'V' (Void) can be saved and loaded, albeit with a loss of type information. However, float8_e5m2 requires special handling, as numpy.load() fails when encountering a header with 'descr': '<f1'.

To ensure consistency and robustness, I propose that all "custom" NumPy types in ml_dtypes should be assigned kind 'V' (Void). This would align all types with the existing convention and avoid issues with serialization.

Risk/Pain Assessment of the Transition

The transition should have minimal impact on platform-independent formats, such as .npy or .npz, since they currently do not work with the float8_e5m2 type. (np.load fails to load 'descr': '<f1' header)

Binary serialization formats, like pickle.dump, would be affected by this change. However, the inherent risks of binary incompatibility are expected for such formats, as they are not intended to serve as reliable interchange formats.

@jakevdp
Copy link
Collaborator

jakevdp commented Oct 1, 2024

I think this may have been done deliberately? I'm not sure. Pinging @hawkinsp who may remember.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants