Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes due to undefined behaviour from signed integer overflow in rev_fwd_lift_int32 #241

Closed
mjwillson opened this issue Aug 16, 2024 · 8 comments

Comments

@mjwillson
Copy link

Hello,

This bug seems to consistently cause SIGILL crashes for me when compressing float data using reversible encoding.
When running under UBSAN (https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html ) the issue is revealed to be a signed integer overflow:

third_party/zfp/src/template/revencode.c:24:5: runtime error: signed integer overflow: 1547771904 - -1409359872 cannot be represented in type 'int32' (aka 'int')                                                 
    #0 0x55d22fcec482 in rev_fwd_lift_int32 third_party/zfp/src/template/revencode.c:24:5                
    #2 0x55d22fceb26d in rev_encode_block_int32_2 third_party/zfp/src/template/revencode.c:60:3          
    #3 0x55d22fceb26d in rev_encode_block_float_2 third_party/zfp/src/template/revencodef.c:78:11        
    #4 0x55d22fceb26d in zfp_encode_block_float_2 third_party/zfp/src/template/encodef.c:98:28           
    #5 0x55d22fceb72a in zfp_encode_block_strided_float_2 third_party/zfp/src/template/encode2.c:50:10   
    #6 0x55d22fd0233f in compress_strided_float_2 third_party/zfp/src/template/compress.c:54:9           
    #7 0x55d22fd01bf2 in zfp_compress third_party/zfp/src/zfp.c:1116:3
...

Is this a bug, or were you relying on wraparound overflow happening silently here? Unfortunately it's undefined behaviour and at least some compilers are going to generate crashing code.

@lindstro
Copy link
Member

Indeed, that looks like a faulty assumption that integer overflow wraps. One obvious fix would be to use unsigned arithmetic here, though that requires some changes that propagate across a few functions. I'll see if I can get a fix done in the next few days.

Out of curiosity, what compiler and hardware are you using?

@mjwillson
Copy link
Author

Thanks!
Hardware nothing unusual (amd64), the compiler is clang, but in google's build environment which is passing in a lot of flags by default and from my experience tends to be quite cautious around undefined behaviour (i.e. err on the side of crashing).

@lindstro
Copy link
Member

It seems we're about to open up Pandora's box. These UB issues are not confined just to signed integer overflow in the reversible zfp algorithm, but the same can evidently happen in the far more important non-reversible algorithm. For example, a 1D float block { 7.f, 7.f, 7.f, 7.f } encoded with maxprec = 2 results in decoded input x = 230, y = z = w = 0 to inv_lift():

static void
_t1(inv_lift, Int)(Int* p, ptrdiff_t s)
{
Int x, y, z, w;
x = *p; p += s;
y = *p; p += s;
z = *p; p += s;
w = *p; p += s;
/*
** non-orthogonal transform
** ( 4 6 -4 -1) (x)
** 1/4 * ( 4 2 4 5) (y)
** ( 4 -2 4 -5) (z)
** ( 4 -6 -4 1) (w)
*/
y += w >> 1; w -= y >> 1;
y += w; w <<= 1; w -= y;
z += x; x <<= 1; x -= z;
y += z; z <<= 1; z -= y;
w += x; x <<= 1; x -= w;
p -= s; *p = w;
p -= s; *p = z;
p -= s; *p = y;
p -= s; *p = x;
}

On line 26, x <<= 1 results in x = 231 (regardless of how we implement this doubling of x; see below), which overflows this 32-bit signed integer. Of course, this is temporary since x -= z would restore x to 230, but by then it's too late.

This is a larger challenge as the non-reversible algorithm relies on left and right shifts of signed quantities, with the assumption that right shifts are arithmetic (which is implementation-defined rather than undefined behavior). Worse yet, there are left shifts by one (to invert right shifts by one), and those are UB on signed integers (who knew?!). We can address those by replacing x <<= 1 with x *= 2 or x += x, but we still need to avoid integer overflow.

One potential solution might be to consolidate x += x; x -= z; as x += x - z, though additional work would be needed to determine whether that is safe. Clearly, if neither z += x nor x -= z overflows, then this solution ought to be valid since x - z is simply -z before z += x was executed. I believe we should never have z = -231 as input, which (using two's complement) would overflow when negated, but if that were a concern, then x -= z - x might do the trick. One drawback of this approach is that we have now left the realm of lifting, but so be it.

We could also implement this with additional variables:

  Int x2 = x1 - z1; Int z2 = z1 + x1;

Or using a temporary:

  Int t = x; x -= z; z += t;

If the above still does not prevent overflow, then the only solution I can think of is to perform all arithmetic using unsigned integers, but that introduces complications with implementing arithmetic right shifts. Instead of x >>= 1, we'd have to use some contrived code like x = x & 0x80000000u ? ~(~x >> 1) : x >> 1, or perhaps x = (x & 0x80000000u) | (x >> 1). The cleanest would be to use sal() and sar() macros or functions for arithmetic shifting, but I am concerned about how this would impact performance and code readability. Of course, that is to be preferred over UB, though in practice we've gone 10 years without anyone noticing this in C and C++ implementations (and more recently in CUDA, HIP, and SYCL) using many different compilers and platforms. I guess it's hard to find an architecture that does not use two's complement and arithmetic right shifts with signed integers these days.

Let me think about how to best address this. It unfortunately seems to be a larger issue than what can be addressed in a couple of days.

@mjwillson
Copy link
Author

mjwillson commented Aug 19, 2024

Thank you and sorry about the can of worms!

For now we are able to work around this by compiling with -fwrapv -fno-sanitize=signed-integer-overflow, I believe these are supported by clang and gcc but have only tried with clang.

-fwrapv is "Treat signed integer overflow as two’s complement":
https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fwrapv
https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#:~:text=fwrapv

Ideally I think it would be better to implement in a portable way, but if these flags are needed then good to document this and add to your build config here, ideally only for a minimal required set of files.

@lindstro
Copy link
Member

I believe this should be fixed in zfp, as valid input data should never invoke UB. This is somewhat different from #242, where the user violates zfp's assumptions on all-numerical inputs. I'm, however, unsure about the best path forward. My sense is that we need to do some work to prove that any one of the suggested changes above is guaranteed to avoid UB, but it will take some time. Fortunately we have very smart people at LLNL working on exactly such correctness problems, and I'm hoping we can solicit their help.

@lindstro
Copy link
Member

@mjwillson: #241 is being addressed on the bugfix/integer-overflow branch. Please check if the fix works for you.

We're currently struggling with tests passing in general because of outdated scripts and some LLNL internal issues with GitLab. We hope to resolve these issues in the coming weeks.

@lindstro
Copy link
Member

@mjwillson I would like to merge this fix into develop, but I would prefer to hear from you first to make sure it addresses all of your concerns.

@lindstro
Copy link
Member

I have merged this fix and am closing the issue. Please reopen if it does not address your needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants