Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make number serialzation precision configurable #4591

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

maddanio
Copy link

@maddanio maddanio commented Jan 9, 2025

This PR adds the possibility to control floating point serialization precision via std::setprecision. I also have a version with an explicit annotator instead if you absolutely want to avoid behaviour changes (as users may have inadvertedly modified the iostate in their code right now), but I find this way much more natural (I was quite suprised that std::setprecision had no effect).


Pull request checklist

Read the Contribution Guidelines for detailed information.

  • Changes are described in the pull request, or an existing issue is referenced.
  • The test suite compiles and runs without error.
  • Code coverage is 100%. Test cases can be added by editing the test suite.
  • The source code is amalgamated; that is, after making changes to the sources in the include/nlohmann directory, run make amalgamate to create the single-header files single_include/nlohmann/json.hpp and single_include/nlohmann/json_fwd.hpp. The whole process is described here.

@maddanio maddanio requested a review from nlohmann as a code owner January 9, 2025 14:16
@maddanio maddanio changed the title Feature/floating point precision iostate make number serialzation precision configurable Jan 9, 2025
@maddanio maddanio force-pushed the feature/floating_point_precision_iostate branch from 58cbed7 to 954b534 Compare January 9, 2025 14:19
Copy link

github-actions bot commented Jan 9, 2025

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @maddanio
Please read and follow the Contribution Guidelines.

1 similar comment
Copy link

github-actions bot commented Jan 9, 2025

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @maddanio
Please read and follow the Contribution Guidelines.

@maddanio maddanio force-pushed the feature/floating_point_precision_iostate branch 2 times, most recently from b7fbcdd to 42c5426 Compare January 9, 2025 15:11
Copy link

github-actions bot commented Jan 9, 2025

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @maddanio
Please read and follow the Contribution Guidelines.

include/nlohmann/detail/conversions/to_chars.hpp Outdated Show resolved Hide resolved
include/nlohmann/detail/conversions/to_chars.hpp Outdated Show resolved Hide resolved
@@ -1272,10 +1272,11 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
string_t dump(const int indent = -1,
const char indent_char = ' ',
const bool ensure_ascii = false,
const error_handler_t error_handler = error_handler_t::strict) const
const error_handler_t error_handler = error_handler_t::strict,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to #4513 (comment) where a bjdata_config_t type was discussed to control the BJData serialization, I think it is also long time to have such a time for the JSON serialization as the parameters start piling up. I'm hesitant to say if this is the last parameter to add before such a config type is introduced, or the first one where we stop the madness...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so mine is one too much? :) but sure, I understand, will look into it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, hang on, isnt that a public api breaking change? so you want me to change the public api? or add an overload?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so mine is one too much? :) but sure, I understand, will look into it

Many others before you have been "one too much" and thus haven't been done. :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has been so many discussions about tweaking every last aspect of serializations that adding parameters is getting out of hand.

include/nlohmann/json.hpp Show resolved Hide resolved
include/nlohmann/json.hpp Outdated Show resolved Hide resolved
include/nlohmann/detail/output/serializer.hpp Outdated Show resolved Hide resolved
include/nlohmann/detail/output/serializer.hpp Outdated Show resolved Hide resolved
@@ -832,6 +833,8 @@ class serializer

// the actual conversion
// NOLINTNEXTLINE(cppcoreguidelines-pro-type-vararg,hicpp-vararg)
char format[100];
snprintf(format, 100, "%%.%dg", precision);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is format used? Is snprintf really necessary? Maybe check out details::concat from detail/string_concat.hpp.

Copy link
Contributor

@gregmarr gregmarr Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it was supposed to replace "%.*g" below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should instead just do ..."%*.*g", precision, d, x)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means that precision should be int instead of size_t. There is no reason for it to be larger than int.

Copy link
Contributor

@gregmarr gregmarr Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's the wrong position, that's width. We already specify precision as d, so a defined precision override just replaces the existing d.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, on it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized i did not test this path. can you point me to another test that actually provides a non-ieee float?

include/nlohmann/detail/conversions/to_chars.hpp Outdated Show resolved Hide resolved
include/nlohmann/detail/conversions/to_chars.hpp Outdated Show resolved Hide resolved
@nlohmann nlohmann added the state: please discuss please discuss the issue or vote for your favorite option label Jan 9, 2025
@maddanio
Copy link
Author

maddanio commented Jan 9, 2025

thanks for looking at this. so does that mean this PR will get merged when I address all of the requests? I was submitting it a bit speculatively and dont want to sink work if thats not the case or more principal discussion is needed first

@maddanio maddanio force-pushed the feature/floating_point_precision_iostate branch from 42c5426 to 6add13a Compare January 9, 2025 20:54
@gregmarr
Copy link
Contributor

gregmarr commented Jan 9, 2025

@nlohmann has the final say here on whether this can go forward.

Prior to that, I think it would be good to define what exactly precision means here. Is it the maximum number of digits after the decimal point, or the exact number? Having it mean exact is more involved, as it can change a current 0.0 to 0.000000. This should also define if it is going to use truncation, or rounding. That is, is PI with precision 3 3.142 or 3.141. Similarly, is 1.999999999 to any precision less than 9 just 2.0?

One reason I recommended using -1 as an override is that there is more here than just shortening the number. If a precision isn't defined, everything just behave exactly as it is now. If it is defined, then you have to look at rounding, and thus you may not be able to use Grisu2. I haven't looked into it enough to know if you can implement a rounding precision parameter in Grisu2.

Then the next question is whether std::setprecision affects this, or if it's only configured via parameters to dump and if that means that it's time to create a configuration method for dump. That is related to the question of whether or not it would affect the ostream << json operators. As mentioned in the initial comment, having std::setprecision affect either of these things would be a breaking change.

const bool pretty_print = o.width() > 0;
const auto indentation = pretty_print ? o.width() : 0;
const bool pretty_print = o.width > 0;
const auto indentation = pretty_print ? o.width : 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't quite revert this properly, missing the ().

@gregmarr
Copy link
Contributor

gregmarr commented Jan 9, 2025

About the maximum or exact and truncation vs rounding:

    std::cout << std::setprecision(3) << M_PI << "\n" << 1.9999 << "\n";

This prints

3.142
2

https://www.godbolt.org/z/8Mne8Y59M

Copy link

github-actions bot commented Jan 9, 2025

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @maddanio
Please read and follow the Contribution Guidelines.

@nlohmann
Copy link
Owner

nlohmann commented Jan 9, 2025

I am very hesitant, and these are just my thoughts right now:

  • I personally don't think the serialization is broken. It does what it should, and the reason we chose the current precision is to enable roundtripping.
  • A lot of people are offended by the way the library does serialization. Partly because they don't understand how floating-point numbers work. How can a library change their 1.9 to 1.89999999999999991118?!
  • The other part are concerned about bandwidth and want to limit the number of bytes used to write π to the wire.
  • Again others want to control more aspects of the output, like where \n are added, how content is aligned, etc.
  • All this can be implemented somehow, and it's never about "What C++ code is needed to realize this problem?" but more "How can we put this into an API that people want to use, and other people want to maintain in their free time."
  • The questions @gregmarr raises are very valid, and I think there is no single right answer to them. If we define precision = "maximal number of digits after the . using truncation rather than rounding", then there will be a feature request later this year asking to change any other point on the list.
  • Also, the library is frequently criticized for its performance. Adding all these extension hooks does not really improve this. Also, the other JSON libraries seem not to care about this aspect (or their users don't ask for it). Performance should not be the "no" to all feature requests, but should definitely be on the list of concerns.
  • It's hard to find the right answer to all this, because in the end, the library is targeting developers, not just myself.

Sorry for this rough brain-dump. I will not be able to make any decision right now. But I think I will not add this without having a clear plan how serialization can be improved conceptually before adding some more ifs here and there.

@nlohmann
Copy link
Owner

nlohmann commented Jan 9, 2025

I forgot one thing that other libraries do that avoids some of the confusion I see in issues here: they store the original input string and only convert it to a number on demand. As a result, parsing is cheaper and you can create the exact serialization of the input. Not sure if we want to do this here, but a "precision = keep whatever precision I put in" requirement is also thinkable.

@maddanio
Copy link
Author

I dont think performance difference is measurable, the only action is changing some pointers and doing some comparisons. if any I would surmise performance improves with lower precision.

yes, it is truncating, so if you want rounding that would be more involved. but if desired I would be willing to dig in.

all in all we in my company really want this, because we have some rather large sets of numbers in json, and this saves us a rather large amount of space and noise. but if you don't think this is what the library needs we will just continue on our fork. and I fully understand how float works, but I also know in many cases the number of digits I really need, and what is just noise (in physics when we calculated stuff we rarely kept more than 3 significant digits :))

Signed-off-by: Daniel Oberhoff <[email protected]>
Signed-off-by: Daniel Oberhoff <[email protected]>
@maddanio maddanio force-pushed the feature/floating_point_precision_iostate branch from 6add13a to 0132248 Compare January 10, 2025 08:18
Copy link

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @maddanio
Please read and follow the Contribution Guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L state: please discuss please discuss the issue or vote for your favorite option tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants