Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect representation with Schubfach? #13

Open
blueglyph opened this issue Mar 9, 2023 · 4 comments
Open

Incorrect representation with Schubfach? #13

blueglyph opened this issue Mar 9, 2023 · 4 comments

Comments

@blueglyph
Copy link

With a double-precision value corresponding to the minimum value (hex representation 0x0000000000000001),

  • I'm expecting this string: "4.9E-324"
  • I get this string: "5e-324"

The expected string is coming from the Java implementation from the algorithm's author, Raffaello Giulietti.

I haven't done a more thorough test to see if there were other discrepancies. Most of the results are definitely fine but finding different values, and seeing the algorithm has been adapted, is not reassuring. Perhaps were there small adaptations in the original algorithm? There are several versions of the article.

Quick and dirty test file (requires schubfach_64.cpp and schubfach_64.h)

#include <cstdint>
#include <cstdio>
#include <cstring>
#include "schubfach_64.h"

using namespace std;
using namespace schubfach;

char BUFFER[DtoaMinBufferLength];

char *dtoa(double value)
{
    char *end = Dtoa(BUFFER, value);
    *end = 0;
    return BUFFER;
}

template <typename Dest, typename Source>
static inline Dest ReinterpretBits(Source source)
{
    static_assert(sizeof(Dest) == sizeof(Source), "size mismatch");

    Dest dest;
    std::memcpy(&dest, &source, sizeof(Source));
    return dest;
}

int main()
{
    uint64_t min_value_bits = 0x0000000000000001;
    double min_value = ReinterpretBits<double>(min_value_bits);
    double values[] = { 1.0, 10.0, 0.5, min_value };
    for (double i: values) {
        printf("%g: %s\n", i, dtoa(i));
    }
}
@abolz
Copy link
Owner

abolz commented Mar 9, 2023

Java, C++ (std::to_chars), and JavaScript all have slightly different formatting specifications. The Java implementation requires to print two significant digits for the minimum double precision value, although one significant digit is sufficient (i.e. 4.9e-324 and 5e-324 actually represent the same number). The algorithm implemented here always uses the shorter representation: it is not Java-conform. (C++ and JavaScript both require to always print the minimum number of digits.)

You can customize the current formatting procedure slightly by tweaking these constants

static constexpr int32_t MinFixedDecimalPoint = -6;
static constexpr int32_t MaxFixedDecimalPoint = 17;

IIRC JavaScript uses (-6, 21) and std::to_chars uses (-4, 6) here. I'm actually not sure why I chose 17 for the upper limit... but it might have something to do with the performance of the (partial) std::from_chars implementation here.

These constants do not affect the correctness of the algorithm. You can choose (almost) any pair of values here. Or you could even write your own formatting procedure if you like: The core algorithm gives you a number of the form N * 10^E.

@blueglyph
Copy link
Author

I know about these values, but that's in FormatDigits(). The difference I see is already in the value returned by ToDecimal64().

In the Java implementation, toDecimal(int q, long c, int dk) yields s = 49, k+dk = -325 (the rounding to 50/-325 is not performed), and that is what is used to generate the decimal representation.

The C++ version does the rounding but with s = 4, a boolean (?) is added to yield s + roundup = 5, k = -324.
So the value dec = ToDecimal64(significand, exponent) is { digits: 5, exponent: -324 }.

I don't know the algorithm enough to see why it's different, but I can probably trace where it starts to diverge. I'm not sure whether it's important or not.

@blueglyph
Copy link
Author

I doubt that it changes anything about the conclusion for your project though, it's only a small difference of output for the smallest value. 😉

@blueglyph
Copy link
Author

Actually, I think it's because of this particular mode in the Java implementation, so maybe not an issue, or something that can be added if someone requires the extra precision:

In DoubleDecimal.java, toDecimal(double v), line 341

                // subnormal value
                return t < C_TINY
                       ? toDecimal(Q_MIN, 10 * t, -1)
                       : toDecimal(Q_MIN, t, 0);

with C_TINY = 3, and for this value, t = 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants