Crashes due to undefined behaviour from signed integer overflow in rev_fwd_lift_int32 #241

mjwillson · 2024-08-16T19:48:15Z

Hello,

This bug seems to consistently cause SIGILL crashes for me when compressing float data using reversible encoding.
When running under UBSAN (https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html ) the issue is revealed to be a signed integer overflow:

third_party/zfp/src/template/revencode.c:24:5: runtime error: signed integer overflow: 1547771904 - -1409359872 cannot be represented in type 'int32' (aka 'int')                                                 
    #0 0x55d22fcec482 in rev_fwd_lift_int32 third_party/zfp/src/template/revencode.c:24:5                
    #2 0x55d22fceb26d in rev_encode_block_int32_2 third_party/zfp/src/template/revencode.c:60:3          
    #3 0x55d22fceb26d in rev_encode_block_float_2 third_party/zfp/src/template/revencodef.c:78:11        
    #4 0x55d22fceb26d in zfp_encode_block_float_2 third_party/zfp/src/template/encodef.c:98:28           
    #5 0x55d22fceb72a in zfp_encode_block_strided_float_2 third_party/zfp/src/template/encode2.c:50:10   
    #6 0x55d22fd0233f in compress_strided_float_2 third_party/zfp/src/template/compress.c:54:9           
    #7 0x55d22fd01bf2 in zfp_compress third_party/zfp/src/zfp.c:1116:3
...

Is this a bug, or were you relying on wraparound overflow happening silently here? Unfortunately it's undefined behaviour and at least some compilers are going to generate crashing code.

The text was updated successfully, but these errors were encountered:

lindstro · 2024-08-16T20:29:41Z

Indeed, that looks like a faulty assumption that integer overflow wraps. One obvious fix would be to use unsigned arithmetic here, though that requires some changes that propagate across a few functions. I'll see if I can get a fix done in the next few days.

Out of curiosity, what compiler and hardware are you using?

mjwillson · 2024-08-16T22:32:44Z

Thanks!
Hardware nothing unusual (amd64), the compiler is clang, but in google's build environment which is passing in a lot of flags by default and from my experience tends to be quite cautious around undefined behaviour (i.e. err on the side of crashing).

lindstro · 2024-08-18T20:12:38Z

It seems we're about to open up Pandora's box. These UB issues are not confined just to signed integer overflow in the reversible zfp algorithm, but the same can evidently happen in the far more important non-reversible algorithm. For example, a 1D float block { 7.f, 7.f, 7.f, 7.f } encoded with maxprec = 2 results in decoded input x = 2³⁰, y = z = w = 0 to inv_lift():

zfp/src/template/decode.c

Lines 8 to 34 in 5c976d8

    
           static void 
        
           _t1(inv_lift, Int)(Int* p, ptrdiff_t s) 
        
           { 
        
             Int x, y, z, w; 
        
             x = *p; p += s; 
        
             y = *p; p += s; 
        
             z = *p; p += s; 
        
             w = *p; p += s; 
        
             /* 
        
             ** non-orthogonal transform 
        
             **       ( 4  6 -4 -1) (x) 
        
             ** 1/4 * ( 4  2  4  5) (y) 
        
             **       ( 4 -2  4 -5) (z) 
        
             **       ( 4 -6 -4  1) (w) 
        
             */ 
        
             y += w >> 1; w -= y >> 1; 
        
             y += w; w <<= 1; w -= y; 
        
             z += x; x <<= 1; x -= z; 
        
             y += z; z <<= 1; z -= y; 
        
             w += x; x <<= 1; x -= w; 
        
             p -= s; *p = w; 
        
             p -= s; *p = z; 
        
             p -= s; *p = y; 
        
             p -= s; *p = x; 
        
           }

On line 26, x <<= 1 results in x = 2³¹ (regardless of how we implement this doubling of x; see below), which overflows this 32-bit signed integer. Of course, this is temporary since x -= z would restore x to 2³⁰, but by then it's too late.

This is a larger challenge as the non-reversible algorithm relies on left and right shifts of signed quantities, with the assumption that right shifts are arithmetic (which is implementation-defined rather than undefined behavior). Worse yet, there are left shifts by one (to invert right shifts by one), and those are UB on signed integers (who knew?!). We can address those by replacing x <<= 1 with x *= 2 or x += x, but we still need to avoid integer overflow.

One potential solution might be to consolidate x += x; x -= z; as x += x - z, though additional work would be needed to determine whether that is safe. Clearly, if neither z += x nor x -= z overflows, then this solution ought to be valid since x - z is simply -z before z += x was executed. I believe we should never have z = -2³¹ as input, which (using two's complement) would overflow when negated, but if that were a concern, then x -= z - x might do the trick. One drawback of this approach is that we have now left the realm of lifting, but so be it.

We could also implement this with additional variables:

  Int x2 = x1 - z1; Int z2 = z1 + x1;

Or using a temporary:

  Int t = x; x -= z; z += t;

If the above still does not prevent overflow, then the only solution I can think of is to perform all arithmetic using unsigned integers, but that introduces complications with implementing arithmetic right shifts. Instead of x >>= 1, we'd have to use some contrived code like x = x & 0x80000000u ? ~(~x >> 1) : x >> 1, or perhaps x = (x & 0x80000000u) | (x >> 1). The cleanest would be to use sal() and sar() macros or functions for arithmetic shifting, but I am concerned about how this would impact performance and code readability. Of course, that is to be preferred over UB, though in practice we've gone 10 years without anyone noticing this in C and C++ implementations (and more recently in CUDA, HIP, and SYCL) using many different compilers and platforms. I guess it's hard to find an architecture that does not use two's complement and arithmetic right shifts with signed integers these days.

Let me think about how to best address this. It unfortunately seems to be a larger issue than what can be addressed in a couple of days.

mjwillson · 2024-08-19T12:26:02Z

Thank you and sorry about the can of worms!

For now we are able to work around this by compiling with -fwrapv -fno-sanitize=signed-integer-overflow, I believe these are supported by clang and gcc but have only tried with clang.

-fwrapv is "Treat signed integer overflow as two’s complement":
https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fwrapv
https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#:~:text=fwrapv

Ideally I think it would be better to implement in a portable way, but if these flags are needed then good to document this and add to your build config here, ideally only for a minimal required set of files.

lindstro · 2024-08-19T16:14:57Z

I believe this should be fixed in zfp, as valid input data should never invoke UB. This is somewhat different from #242, where the user violates zfp's assumptions on all-numerical inputs. I'm, however, unsure about the best path forward. My sense is that we need to do some work to prove that any one of the suggested changes above is guaranteed to avoid UB, but it will take some time. Fortunately we have very smart people at LLNL working on exactly such correctness problems, and I'm hoping we can solicit their help.

lindstro · 2024-09-18T02:39:00Z

@mjwillson: #241 is being addressed on the bugfix/integer-overflow branch. Please check if the fix works for you.

We're currently struggling with tests passing in general because of outdated scripts and some LLNL internal issues with GitLab. We hope to resolve these issues in the coming weeks.

lindstro · 2024-09-25T17:29:42Z

@mjwillson I would like to merge this fix into develop, but I would prefer to hear from you first to make sure it addresses all of your concerns.

lindstro · 2024-10-16T16:25:06Z

I have merged this fix and am closing the issue. Please reopen if it does not address your needs.

mjwillson mentioned this issue Aug 19, 2024

Undefined behaviour crash on NaN/inf/-inf input #242

Open

lindstro closed this as completed in 3eafd7c Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crashes due to undefined behaviour from signed integer overflow in rev_fwd_lift_int32 #241

Crashes due to undefined behaviour from signed integer overflow in rev_fwd_lift_int32 #241

mjwillson commented Aug 16, 2024

lindstro commented Aug 16, 2024

mjwillson commented Aug 16, 2024

lindstro commented Aug 18, 2024

mjwillson commented Aug 19, 2024 •

edited

Loading

lindstro commented Aug 19, 2024

lindstro commented Sep 18, 2024

lindstro commented Sep 25, 2024

lindstro commented Oct 16, 2024

Crashes due to undefined behaviour from signed integer overflow in rev_fwd_lift_int32 #241

Crashes due to undefined behaviour from signed integer overflow in rev_fwd_lift_int32 #241

Comments

mjwillson commented Aug 16, 2024

lindstro commented Aug 16, 2024

mjwillson commented Aug 16, 2024

lindstro commented Aug 18, 2024

mjwillson commented Aug 19, 2024 • edited Loading

lindstro commented Aug 19, 2024

lindstro commented Sep 18, 2024

lindstro commented Sep 25, 2024

lindstro commented Oct 16, 2024

mjwillson commented Aug 19, 2024 •

edited

Loading