ACP: hex formatting and parsing for floats #536

tgross35 · 2025-02-08T00:48:23Z

Proposal

Problem statement

Rust's floats do not have IEE hex float formatting (C's %a), e.g. "-0x1.921fb6p+1".

Motivating examples or use cases

Rust's decimals do roundtrip exactly, unlike C. However, there are a few other reasons to support the hex format:

This format is specified in IEEE-754 so it provides exact interchange with C/++
Better representation of the underlying format (convenience for inspecting floats)
The representation shows exact values, excess precision is rejected when parsing
Significantly more lightweight formatting and parsing than decimal, which is useful for binary size
Parsing can be const, which allows let x = const { f64::from_hex_str("0x1.23p+42") } in lieu of const literals.

Solution sketch

Implement LowerHex and UpperHex for f16, f32, f64, and f128 to provide the output. Additionally, introduce a method to parse the above:

impl {f16, f32, f64, f128} {
    // Accepts anything from the IEEE specification, even if it does not exactly match what we produce
    fn from_hex_str<S: AsRef<[u8]>>(src: &S) -> Result<Self, ParseFloatError>;

    // Alternative:
    const fn from_hex_str(src: &str) -> Result<Self, ParseFloatError>;
}

The format is specified by IEEE:

Language standards should provide conversions between all supported binary formats and external hexadecimal-significand character sequences. External hexadecimal-significand character sequences for finite numbers shall be described by the following grammar, which defines a hexSequence:
sign [+ −]
digit [0123456789]
hexDigit [0123456789abcdefABCDEF]
hexExpIndicator [Pp]
hexIndicator "0" [Xx]
hexSignificand ( {hexDigit} * "." {hexDigit}+ | {hexDigit}+ "." | {hexDigit}+ )
decExponent {hexExpIndicator} {sign}? {digit}+
hexSequence {sign}? {hexIndicator} {hexSignificand} {decExponent}
[...]

The value of a hexSequence is the value of the hexSignificand multiplied by two raised to the power of the value of the decExponent, negated if there is a leading ‘−’ sign. The hexIndicator and the hexExpIndicator have no effect on the value.

Proposed Rust-specific rules:

The leading value is always 1. if nonzero. 754 allows any hex sequence before the . (e.g. Julia uses a single hex digit of any value, some implementations use 0. to indicate subnormals); we should accept this but not produce it.
abcdef/ABCDEF, p/P is determined by :x or :X
0.0 reproduces as "0x0p+0", infinity as "inf" or "INF", NaN as "NaN" or "NAN"
0x is always reproduced regardless of the :# format parameter
The :+ format should work

Alternatives

Provide a method that parses &[u8] instead, to avoid needless conversion to &str (this is an existing problem for int::from_str_radix). Edit: changed to AsRef based on Josh's comment ACP: hex formatting and parsing for floats #536 (comment).

Links and related work

Implementation of parsing & printing in libm https://github.com/rust-lang/libm/blob/670f8a87373aa00a5e341cc167d6ad181b15ba38/src/math/support/hex_float.rs
hexf provides parsing
hexfloat2 provides printing
Pre-1.0 issue for hex float literals Add support for hexadecimal float literals rust#1433

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

We think this problem seems worth solving, and the standard library might be the right place to solve it.
We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

The text was updated successfully, but these errors were encountered:

programmerjake · 2025-02-08T00:59:31Z

in order to match {:#X} for integers, I think the leading 0x should always be lowercase.

tgross35 · 2025-02-08T01:56:40Z

in order to match {:#X} for integers, I think the leading 0x should always be lowercase.

Good point, I removed that bit. We should probably still parse it since 754 says 0X is valid.

On that note we only print 0x for integers if # is there, and from_str_radix rejects the prefix. But I think it is reasonable to always print/parse the prefix here since this is more of a defined format.

RalfJung · 2025-02-08T07:43:58Z

How will subnormals be printed, if not via a leading 0.?

joshtriplett · 2025-02-08T08:35:19Z

+1 for adding this. This seems like a good solution for handling hex floats in general.

Provide a method that parses &[u8] instead, to avoid needless conversion to &str (this is an existing problem for int::from_str_radix).

We could accept AsRef<[u8]>, which would allow str, String, Vec, arrays, and various other things: https://doc.rust-lang.org/std/convert/trait.AsRef.html#implementors

tgross35 · 2025-02-08T08:53:35Z

How will subnormals be printed, if not via a leading 0.?

The input is always shifted to normalize and the exponent is just adjusted, so e.g. the minimum f64 subnormal will print as 0x1p-1074 rather than 0x0.0000000000001p-1022. C does produce the latter form, but it can parse either https://gcc.godbolt.org/z/rseMx4M3x.

Both seem to be in use other places. I proposed the normalized form since it at a quick glance I can still read ~2^-1074 (vs. needing to count the zeros in the C version) and, if it matters, know it's subnormal based on the exponent. But I'm not tied to it, more literally matching the bitwise representation can be nice other times. 🤷‍♂

tgross35 · 2025-02-08T08:57:03Z

We could accept AsRef<[u8]>, which would allow str, String, Vec, arrays, and various other things: https://doc.rust-lang.org/std/convert/trait.AsRef.html#implementors

I didn't do this originally because of consistency with from_str_radix. But consistency aside, AsRef would definitely be nicer so I updated the description 👍

tgross35 · 2025-02-08T09:00:23Z

Argh, and AsRef would force parsing to be not const. Const traits can't come soon enough...

joshtriplett · 2025-02-08T10:21:21Z

@tgross35 Sigh. It seems like the right solution, though, so one question is whether we would have const traits before trying to stabilize this. (At least, enough of const traits that we can expose it, even if const traits themselves aren't stable yet.)

tgross35 · 2025-02-08T10:53:34Z

Agreed, AsRef seems like a reasonably early target for constification based on the relatively trivial implementation. Twocents: I'd prefer to have this unstably as const fn(&str) (everywhere I am currently using something like this requires const), with a note that we almost definitely want to change to AsRef before considering stabilization since that's trivial.

First thing the implementation does anyway is call .as_bytes() 🙂 https://github.com/rust-lang/libm/blob/670f8a87373aa00a5e341cc167d6ad181b15ba38/src/math/support/hex_float.rs#L41

joshtriplett · 2025-02-08T13:08:24Z

Suggestion: Let's have two functions, one with a more ideal name (e.g. from_hex) that takes AsRef and won't support const until we have const traits, and one with a less ideal name (e.g. from_hex_slice) that takes &[u8] and is const. We can decide at stabilization time whether we want to stabilize both or wait for const traits.

joshtriplett · 2025-02-11T18:53:28Z

We discussed this in today's @rust-lang/libs-api meeting.

We're going to go ahead and approve the from_hex function that accepts AsRef<[u8]>, and won't be able to be const until we have const traits.

We'll consider the from_hex_slice function in the future, depending on how const traits go.

joshtriplett · 2025-02-11T19:04:39Z

Also, a point someone raised in the libs-api meeting: if T-lang decides to accept float literals, the syntax would need to match. (If T-lang decides to not accept float literals, perhaps because this method suffices, then that won't be a concern.)

tgross35 · 2025-02-11T22:32:51Z

We're going to go ahead and approve the from_hex function that accepts AsRef<[u8]>, and won't be able to be const until we have const traits.

Formatting was accepted as well, right? I am assuming so but that will need to go a separate FCP path.

Also, a point someone raised in the libs-api meeting: if T-lang decides to accept float literals, the syntax would need to match. (If T-lang decides to not accept float literals, perhaps because this method suffices, then that won't be a concern.)

Hi lang team member :)

What is the best way to get a vibe check from the rest of the team as to whether this is worth pursuing? I would very much appreciate having them and there are a lot of threads scattered around discussing it, but I don't think there was ever any more concrete feedback. Feasibility of syntax was one possible blocker, I'll try to figure that out on Zulip. (edit: https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/Parsing.20hex.20float.20literals)

quaternic · 2025-02-11T22:35:23Z

If we had hex float literals, it might be reasonable to warn by default for excess precision in those, but ...

excess precision is rejected when parsing

... it doesn't feel right for parsing. The value should be rounded to the target type as usual.

For one use case where that matters, you could be formatting f64s to a file, and possibly parsing them as f32. With hexadecimal floats, this would be equivalent to casting f64 as f32.

With decimal formatting that does not hold, since the printed value is rounded in a way that preserves the value only as long as it is parsed back to the same precision:

let x = 1.0 + 0.5 * f32::EPSILON as f64;
assert_eq!(1.0000001, format!("{x}").parse::<f32>().unwrap());
assert_eq!(1.0, x as f32, );

tgross35 · 2025-02-11T23:00:04Z

If we had hex float literals, it might be reasonable to warn by default for excess precision in those, but ...

excess precision is rejected when parsing

... it doesn't feel right for parsing. The value should be rounded to the target type as usual.

This is something I meant to bring up as a discussion point. But that reasoning makes sense, +1 from me.

RalfJung · 2025-02-12T07:01:16Z

What is the best way to get a vibe check from the rest of the team as to whether this is worth pursuing?

Typically the answer would be: open an issue describing the question, and nominate it for t-lang.

tgross35 · 2025-02-21T18:18:03Z

I'll close this since the ACP was accepted. We'll probably smooth out our implementation in libm first since that's easy, then just copy the result over rust-lang/compiler-builtins#851.

tgross35 added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Feb 8, 2025

joshtriplett added the ACP-accepted API Change Proposal is accepted (seconded with no objections) label Feb 11, 2025

tgross35 mentioned this issue Feb 21, 2025

Seperate implementation of hex float parsing for performance rust-lang/compiler-builtins#851

Open

tgross35 closed this as completed Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ACP: hex formatting and parsing for floats #536

ACP: hex formatting and parsing for floats #536

tgross35 commented Feb 8, 2025 •

edited

Loading

programmerjake commented Feb 8, 2025

tgross35 commented Feb 8, 2025

RalfJung commented Feb 8, 2025

joshtriplett commented Feb 8, 2025

tgross35 commented Feb 8, 2025

tgross35 commented Feb 8, 2025

tgross35 commented Feb 8, 2025

joshtriplett commented Feb 8, 2025

tgross35 commented Feb 8, 2025 •

edited

Loading

joshtriplett commented Feb 8, 2025

joshtriplett commented Feb 11, 2025

joshtriplett commented Feb 11, 2025 •

edited

Loading

tgross35 commented Feb 11, 2025 •

edited

Loading

quaternic commented Feb 11, 2025

tgross35 commented Feb 11, 2025

RalfJung commented Feb 12, 2025

tgross35 commented Feb 21, 2025

ACP: hex formatting and parsing for floats #536

ACP: hex formatting and parsing for floats #536

Comments

tgross35 commented Feb 8, 2025 • edited Loading

Proposal

Problem statement

Motivating examples or use cases

Solution sketch

Alternatives

Links and related work

What happens now?

Possible responses

programmerjake commented Feb 8, 2025

tgross35 commented Feb 8, 2025

RalfJung commented Feb 8, 2025

joshtriplett commented Feb 8, 2025

tgross35 commented Feb 8, 2025

tgross35 commented Feb 8, 2025

tgross35 commented Feb 8, 2025

joshtriplett commented Feb 8, 2025

tgross35 commented Feb 8, 2025 • edited Loading

joshtriplett commented Feb 8, 2025

joshtriplett commented Feb 11, 2025

joshtriplett commented Feb 11, 2025 • edited Loading

tgross35 commented Feb 11, 2025 • edited Loading

quaternic commented Feb 11, 2025

tgross35 commented Feb 11, 2025

RalfJung commented Feb 12, 2025

tgross35 commented Feb 21, 2025

tgross35 commented Feb 8, 2025 •

edited

Loading

tgross35 commented Feb 8, 2025 •

edited

Loading

joshtriplett commented Feb 11, 2025 •

edited

Loading

tgross35 commented Feb 11, 2025 •

edited

Loading