Skip to content

ACP: hex formatting and parsing for floats #536

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tgross35 opened this issue Feb 8, 2025 · 17 comments
Closed

ACP: hex formatting and parsing for floats #536

tgross35 opened this issue Feb 8, 2025 · 17 comments
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@tgross35
Copy link

tgross35 commented Feb 8, 2025

Proposal

Problem statement

Rust's floats do not have IEE hex float formatting (C's %a), e.g. "-0x1.921fb6p+1".

Motivating examples or use cases

Rust's decimals do roundtrip exactly, unlike C. However, there are a few other reasons to support the hex format:

  1. This format is specified in IEEE-754 so it provides exact interchange with C/++
  2. Better representation of the underlying format (convenience for inspecting floats)
  3. The representation shows exact values, excess precision is rejected when parsing
  4. Significantly more lightweight formatting and parsing than decimal, which is useful for binary size
  5. Parsing can be const, which allows let x = const { f64::from_hex_str("0x1.23p+42") } in lieu of const literals.

Solution sketch

Implement LowerHex and UpperHex for f16, f32, f64, and f128 to provide the output. Additionally, introduce a method to parse the above:

impl {f16, f32, f64, f128} {
    // Accepts anything from the IEEE specification, even if it does not exactly match what we produce
    fn from_hex_str<S: AsRef<[u8]>>(src: &S) -> Result<Self, ParseFloatError>;

    // Alternative:
    const fn from_hex_str(src: &str) -> Result<Self, ParseFloatError>;
}

The format is specified by IEEE:

Language standards should provide conversions between all supported binary formats and external hexadecimal-significand character sequences. External hexadecimal-significand character sequences for finite numbers shall be described by the following grammar, which defines a hexSequence:

sign [+ −]
digit [0123456789]
hexDigit [0123456789abcdefABCDEF]
hexExpIndicator [Pp]
hexIndicator "0" [Xx]
hexSignificand ( {hexDigit} * "." {hexDigit}+ | {hexDigit}+ "." | {hexDigit}+ )
decExponent {hexExpIndicator} {sign}? {digit}+
hexSequence {sign}? {hexIndicator} {hexSignificand} {decExponent}

[...]

The value of a hexSequence is the value of the hexSignificand multiplied by two raised to the power of the value of the decExponent, negated if there is a leading ‘−’ sign. The hexIndicator and the hexExpIndicator have no effect on the value.

Proposed Rust-specific rules:

  1. The leading value is always 1. if nonzero. 754 allows any hex sequence before the . (e.g. Julia uses a single hex digit of any value, some implementations use 0. to indicate subnormals); we should accept this but not produce it.
  2. abcdef/ABCDEF, p/P is determined by :x or :X
  3. 0.0 reproduces as "0x0p+0", infinity as "inf" or "INF", NaN as "NaN" or "NAN"
  4. 0x is always reproduced regardless of the :# format parameter
  5. The :+ format should work

Alternatives

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
@tgross35 tgross35 added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Feb 8, 2025
@programmerjake
Copy link
Member

in order to match {:#X} for integers, I think the leading 0x should always be lowercase.

@tgross35
Copy link
Author

tgross35 commented Feb 8, 2025

in order to match {:#X} for integers, I think the leading 0x should always be lowercase.

Good point, I removed that bit. We should probably still parse it since 754 says 0X is valid.

On that note we only print 0x for integers if # is there, and from_str_radix rejects the prefix. But I think it is reasonable to always print/parse the prefix here since this is more of a defined format.

@RalfJung
Copy link
Member

RalfJung commented Feb 8, 2025

How will subnormals be printed, if not via a leading 0.?

@joshtriplett
Copy link
Member

+1 for adding this. This seems like a good solution for handling hex floats in general.

Provide a method that parses &[u8] instead, to avoid needless conversion to &str (this is an existing problem for int::from_str_radix).

We could accept AsRef<[u8]>, which would allow str, String, Vec, arrays, and various other things: https://doc.rust-lang.org/std/convert/trait.AsRef.html#implementors

@tgross35
Copy link
Author

tgross35 commented Feb 8, 2025

How will subnormals be printed, if not via a leading 0.?

The input is always shifted to normalize and the exponent is just adjusted, so e.g. the minimum f64 subnormal will print as 0x1p-1074 rather than 0x0.0000000000001p-1022. C does produce the latter form, but it can parse either https://gcc.godbolt.org/z/rseMx4M3x.

Both seem to be in use other places. I proposed the normalized form since it at a quick glance I can still read ~2^-1074 (vs. needing to count the zeros in the C version) and, if it matters, know it's subnormal based on the exponent. But I'm not tied to it, more literally matching the bitwise representation can be nice other times. 🤷‍♂

@tgross35
Copy link
Author

tgross35 commented Feb 8, 2025

We could accept AsRef<[u8]>, which would allow str, String, Vec, arrays, and various other things: https://doc.rust-lang.org/std/convert/trait.AsRef.html#implementors

I didn't do this originally because of consistency with from_str_radix. But consistency aside, AsRef would definitely be nicer so I updated the description 👍

@tgross35
Copy link
Author

tgross35 commented Feb 8, 2025

Argh, and AsRef would force parsing to be not const. Const traits can't come soon enough...

@joshtriplett
Copy link
Member

@tgross35 Sigh. It seems like the right solution, though, so one question is whether we would have const traits before trying to stabilize this. (At least, enough of const traits that we can expose it, even if const traits themselves aren't stable yet.)

@tgross35
Copy link
Author

tgross35 commented Feb 8, 2025

Agreed, AsRef seems like a reasonably early target for constification based on the relatively trivial implementation. Twocents: I'd prefer to have this unstably as const fn(&str) (everywhere I am currently using something like this requires const), with a note that we almost definitely want to change to AsRef before considering stabilization since that's trivial.

First thing the implementation does anyway is call .as_bytes() 🙂 https://github.com/rust-lang/libm/blob/670f8a87373aa00a5e341cc167d6ad181b15ba38/src/math/support/hex_float.rs#L41

@joshtriplett
Copy link
Member

Suggestion: Let's have two functions, one with a more ideal name (e.g. from_hex) that takes AsRef and won't support const until we have const traits, and one with a less ideal name (e.g. from_hex_slice) that takes &[u8] and is const. We can decide at stabilization time whether we want to stabilize both or wait for const traits.

@joshtriplett
Copy link
Member

We discussed this in today's @rust-lang/libs-api meeting.

We're going to go ahead and approve the from_hex function that accepts AsRef<[u8]>, and won't be able to be const until we have const traits.

We'll consider the from_hex_slice function in the future, depending on how const traits go.

@joshtriplett joshtriplett added the ACP-accepted API Change Proposal is accepted (seconded with no objections) label Feb 11, 2025
@joshtriplett
Copy link
Member

joshtriplett commented Feb 11, 2025

Also, a point someone raised in the libs-api meeting: if T-lang decides to accept float literals, the syntax would need to match. (If T-lang decides to not accept float literals, perhaps because this method suffices, then that won't be a concern.)

@tgross35
Copy link
Author

tgross35 commented Feb 11, 2025

We're going to go ahead and approve the from_hex function that accepts AsRef<[u8]>, and won't be able to be const until we have const traits.

Formatting was accepted as well, right? I am assuming so but that will need to go a separate FCP path.

Also, a point someone raised in the libs-api meeting: if T-lang decides to accept float literals, the syntax would need to match. (If T-lang decides to not accept float literals, perhaps because this method suffices, then that won't be a concern.)

Hi lang team member :)

What is the best way to get a vibe check from the rest of the team as to whether this is worth pursuing? I would very much appreciate having them and there are a lot of threads scattered around discussing it, but I don't think there was ever any more concrete feedback. Feasibility of syntax was one possible blocker, I'll try to figure that out on Zulip. (edit: https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/Parsing.20hex.20float.20literals)

@quaternic
Copy link

If we had hex float literals, it might be reasonable to warn by default for excess precision in those, but ...

excess precision is rejected when parsing

... it doesn't feel right for parsing. The value should be rounded to the target type as usual.

For one use case where that matters, you could be formatting f64s to a file, and possibly parsing them as f32. With hexadecimal floats, this would be equivalent to casting f64 as f32.

With decimal formatting that does not hold, since the printed value is rounded in a way that preserves the value only as long as it is parsed back to the same precision:

let x = 1.0 + 0.5 * f32::EPSILON as f64;
assert_eq!(1.0000001, format!("{x}").parse::<f32>().unwrap());
assert_eq!(1.0, x as f32, );

@tgross35
Copy link
Author

If we had hex float literals, it might be reasonable to warn by default for excess precision in those, but ...

excess precision is rejected when parsing

... it doesn't feel right for parsing. The value should be rounded to the target type as usual.

This is something I meant to bring up as a discussion point. But that reasoning makes sense, +1 from me.

@RalfJung
Copy link
Member

What is the best way to get a vibe check from the rest of the team as to whether this is worth pursuing?

Typically the answer would be: open an issue describing the question, and nominate it for t-lang.

@tgross35
Copy link
Author

I'll close this since the ACP was accepted. We'll probably smooth out our implementation in libm first since that's easy, then just copy the result over rust-lang/compiler-builtins#851.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

5 participants